Data Engineering Essentials

330 min7 sessions1 enrolled

Learn the core concepts of data engineering, from building robust ETL pipelines and designing data warehouses to optimizing SQL queries and orchestrating complex workflows with Airflow.

Sessions

Data Engineering: ETL Extract

After this session, you'll be able to explain why ETL is crucial in today's data-driven world and the challenges of extracting raw data.

5 min

Data Transformation Essentials

After this session, you'll be able to describe common data transformation steps and why they're essential for data quality.

300 min

Data Quality & Governance

After this session, you'll be able to define key data quality dimensions and explain why data governance is vital for reliable insights.

5 min

Airflow & DAGs

After this session, you'll be able to explain the purpose of Apache Airflow and how DAGs are used to define data pipelines.

5 min

SQL Performance Tuning

After this session, you'll be able to identify common SQL performance bottlenecks and explain how indexing can dramatically speed up queries.

5 min

OLTP vs OLAP & Star Schema

After this session, you'll be able to explain the difference between OLTP and OLAP systems and recognize the components of a star schema.

5 min

Apache Spark Explained

After this session, you'll be able to explain why Apache Spark is essential for big data processing and its core conceptual components.

5 min

What you'll achieve

Understand the fundamental stages of an ETL pipeline and their purpose.

Design basic data warehouse schemas like star and snowflake for analytical efficiency.

Apply SQL optimization techniques to improve query performance on large datasets.

Grasp the basic principles of distributed data processing with Apache Spark.

Implement strategies for ensuring high data quality and consistency.

Describe how data orchestration tools like Airflow manage complex data workflows.

Identify common challenges in data engineering and how to address them.