Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
Sports
TV & Film
Technology
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts115/v4/90/e9/76/90e97622-62b2-9b2e-d678-4284b64841d6/mza_16925574863160776620.jpg/600x600bb.jpg
The Data Life Podcast
Sanket Gupta
27 episodes
1 week ago
This is a podcast where we talk all-about real life experiences of dealing with data and machine learning tools, techniques and personalities. We cover not just the technical aspects but also the "life" aspects of working in the field. Note: Opinions expressed are my own and do not express the views or opinions of my employer.
Show more...
Technology
RSS
All content for The Data Life Podcast is the property of Sanket Gupta and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
This is a podcast where we talk all-about real life experiences of dealing with data and machine learning tools, techniques and personalities. We cover not just the technical aspects but also the "life" aspects of working in the field. Note: Opinions expressed are my own and do not express the views or opinions of my employer.
Show more...
Technology
https://d3t3ozftmdmh3i.cloudfront.net/production/podcast_uploaded_nologo/1452849/1452849-1559791844755-0ffa52463af9d.jpg
26: Building Data Engineering Pipelines at Scale (with Data Warehouse, Spark and Airflow)
The Data Life Podcast
39 minutes 30 seconds
4 years ago
26: Building Data Engineering Pipelines at Scale (with Data Warehouse, Spark and Airflow)

Imagine you are at a beach and you are hanging out and seeing all the waves come and go and all the shells on the beach. And you get an idea. How about you collect these shells and make necklaces to sell? Well how would you go about doing this? Maybe you’d collect a few shells and make a small necklace and try to show to your friend. This is where we begin our journey on learning about data engineering pipelines. 

Using an example of running a necklace business from shells - we learn about the following data engineering concepts: 

1. ETL - Extract Transform Load vs ELT - Extract Load Transform concepts. Why Data Warehouses are great for analytics. 

2. Spark for large data processing and hosting / running

3. Data orchestration using Airflow


My blog on Towards Data Science about moving from Pandas to Spark: https://towardsdatascience.com/moving-from-pandas-to-spark-7b0b7d956adb 

Great book to learn about Spark: https://www.amazon.com/dp/1492050040/?tag=omnilence-20 

Tools covered in the episode: 

dbt: https://www.getdbt.com/ 

Databricks: https://databricks.com/

EMR: https://aws.amazon.com/emr/

AWS Redshift: https://aws.amazon.com/redshift/

Snowflake: https://www.snowflake.com/

Delta Lake: https://databricks.com/product/delta-lake-on-databricks 

The Data Life Podcast
This is a podcast where we talk all-about real life experiences of dealing with data and machine learning tools, techniques and personalities. We cover not just the technical aspects but also the "life" aspects of working in the field. Note: Opinions expressed are my own and do not express the views or opinions of my employer.