Course Description
Data Engineering is one of the fastest-growing and highest-paying roles in tech — and this course is your complete roadmap to breaking in. Whether you're a student, a fresh graduate, or switching from another field, this course is designed to take you from zero experience to production-ready data engineer in 24 structured classes.
You'll start with the fundamentals — Linux, Docker, and Python — then progressively build up to advanced database engineering with PostgreSQL and PL/pgSQL, dimensional modeling and Data Warehouse design, and full ETL pipeline development with real-world projects.
In the second half, you'll master distributed systems. Apache Spark for large-scale batch and streaming processing, and Apache Kafka for real-time event pipelines. Every section ends with a hands-on project, and the course closes with a full capstone — a live Kafka + Spark + PostgreSQL pipeline deployed with Docker Compose.
Your instructor, is a Senior Data Engineer with 6+ years of industry experience, where he has built production ETL systems, data warehouses, and real-time pipelines at scale.
What you'll learn
- Use Linux and Docker to set up data engineering environments
- Design and query PostgreSQL databases with advanced SQL and PL/pgSQL
- Develop end-to-end ETL and ELT pipelines with data quality checks
- Stream real-time events using Apache Kafka with producers and consumers
- Write Python scripts and ETL pipelines from scratch
- Build production-grade Data Warehouses with star schema and SCD patterns
- Process large datasets at scale using Apache Spark and PySpark
- Deploy complete multi-service data stacks using Docker Compose
This course includes:
- 48 hours on-demand video
- 2 assignments
- 24 live classes
- Access on mobile and TV
- Certificate of completion
Course Content
-
Linux Essentials for Data Engineers
-
Docker Fundamentals for Data Pipelines
-
Linux Essentials Commands
-
Docker Commands
-
Python Fundamentals — Data Types, Control Flow & Functions
-
File I/O, Pandas Basics, and Pipeline Packaging
-
Advanced SQL — Window Functions, CTEs, and Query Internals
-
PL/pgSQL — Functions, Procedures, and Cursors
-
Triggers, Audit Systems, and Row-Level Security
-
Query Optimization, VACUUM, and Table Partitioning
-
DWH Concepts — Star Schema, Snowflake, and ODS vs DWH
-
Slowly Changing Dimensions — SCD Type 1, 2, 3, 4, and Hybrid
-
Hands-On DWH Build — Full Implementation in PostgreSQL
-
ETL vs ELT — Pipeline Architecture and Design Patterns
-
Building an ETL Framework in Python + Scheduling
-
Data Quality, Error Handling, Logging, and Alerting
-
End-to-End ETL Project — Multi-Source to Data Warehouse
-
Spark Architecture — Driver, Executors, DAG, and Cluster Modes
-
PySpark DataFrames + Spark SQL — Deep Dive
-
Spark Performance — Partitioning, Caching, and Best Practices
-
Spark Structured Streaming
-
Spark Project — Large-Scale Batch Pipeline + Optimization
-
Kafka Architecture — Brokers, Topics, Partitions, and Offsets
-
Producers, Consumers, Kafka Connect + Schema Registry
-
Kafka Best Practices — Reliability, Ordering, and Exactly-Once
-
Capstone — Kafka + Spark Streaming End-to-End Pipeline
Requirements
- Basic computer skills and internet connection
- Familiarity with any programming language is helpful but not required
- No prior data engineering or database experience needed
- A laptop capable of running Docker (8GB RAM recommended)
Student Feedback
None
Review
No reviews yet. Be the first!
Please sign in to write a review.
Instructor
ByteCraft Instructor Team
Senior Data Engineer & BI SpecialistExperienced Data Professional | Lead Instructor at ByteCraft Studio