Migrate a 200TB Data Lake from Parquet to Iceberg

FreeVerified credential4 weeksAdvanced

Overview

What this challenge is about.

Receive an inventory of the 200TB hot tier (around 1,200 tables, around 38 PB of historical data referenced), the current Spark + Trino read patterns, and 6 months of schema-change tickets. Design the migration: per-table strategy (in-place rewrite for small tables vs. write-once-rewrite-incrementally for the 30 largest), catalog choice (Glue vs. Nessie vs. REST), and the cutover plan for live writes. Prototype on a 5TB subset (around 80 tables): migrate, run validation queries via Spark + Trino, measure read-perf delta vs. raw Parquet. Deliver migration scripts, a validation report, a 10-page runbook for the full 200TB, and a 4-page memo on operational ongoing costs.

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Migrate a 200TB Parquet data lake to Iceberg with no read regression and a runbook the data-platform team can execute table-by-table.

Earning criteria — what you'll demonstrate

Choose the right Iceberg catalog for a multi-engine read pattern
Design per-table migration strategies for varying scale
Validate that schema evolution + time travel work end-to-end
Produce a runbook the data platform team can execute table-by-table

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Big Data and Data-Intensive Systems

Master · Cs Se

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Design

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Career mappings coming soon.

One more thing

You can put a credential on your CV by Friday.

Start this challenge