Blog

AWS

Hive vs Iceberg Tables in AWS Athena

Choosing the Best Option for Your Data Pipelines with dbt

Read More

How to build a cost-effective and robust streaming data pipeline

Envision a situation where you're tasked with managing clickstream data received via Snowplow. In this blog post, we'll guide you through our solution, step by step.

Read More

Spark your Infrastructure: Terraform to deploy AWS Glue Pyspark job

Tired of manually provisioning and managing your infrastructure? Well, then it's time to adopt best practices and treat your infrastructure as code. In this blog post, we’ll be diving into the world of Infrastructure as Code (IaC) using one of the most popular tools available - Terraform.

By the end of this post, you’ll have a better understanding of how to leverage Terraform to deploy your AWS Glue Pyspark jobs, giving you a more automated and scalable infrastructure. So, let’s get started and spark your infrastructure!

Read More

How to query your S3 Data Lake using Athena within an AWS Glue Python shell job

AWS Glue, the serverless ETL service of AWS, supports two types of jobs: Spark and Python shell. In this article, we'll focus on Python shell jobs and explain how you can make optimal use of your S3 Data Lake using Athena within Python shell jobs.

Read More