Migrate a Legacy Warehouse to a Lakehouse for an Edtech AI Platform
Overview
What this challenge is about.
You receive a Postgres dump of around 50 GB and the current dbt models that produce the student-attempts mart. Land the raw data in object storage (S3 or GCS) as Parquet partitioned by event date, build the equivalent transformation using Spark or DuckDB on top of an open table format (Delta or Iceberg), and benchmark a representative analytical query (cohort retention by topic). Success is the same logical output as the Postgres pipeline, end-to-end runtime under 30 minutes on a documented cluster, and a side-by-side cost comparison.
The Brief
What you'll do, and what you'll demonstrate.
Prove that migrating one critical mart from Postgres to a lakehouse cuts runtime to under 30 minutes and lowers cost per query at the next growth tier.
Earning criteria — what you'll demonstrate
- Model an open-table-format dataset (Delta or Iceberg) with appropriate partitioning
- Translate dbt-style transformations to a distributed engine
- Benchmark and explain analytical query performance fairly
- Plan a low-risk warehouse-to-lakehouse migration
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Data Engineer
Lakehouse migrations are the dominant data-engineering project of the moment; shipping one PoC end-to-end is a strong portfolio piece.
This challenge sharpens
- lakehouse-architecture
- delta-lake
- spark
AI Solutions Architect
Writing a cost-and-risk migration plan a CTO can sign off on is core solutions-architect output.
This challenge sharpens
- data-modeling
- performance-benchmarking
- lakehouse-architecture
MLOps Engineer
Unblocking the morning retraining job is a quintessentially MLOps win; the lakehouse skills carry directly into MLOps platform work.
This challenge sharpens
- spark
- dbt
- performance-benchmarking