Skip to contentSkip to content
Verified credentials. On-chain. Forever.Learn more
Cover image for Migrate a 200TB Data Lake from Parquet to Iceberg
Design

Migrate a 200TB Data Lake from Parquet to Iceberg

FreeVerified credential4 weeksAdvanced

Overview

What this challenge is about.

Receive an inventory of the 200TB hot tier (around 1,200 tables, around 38 PB of historical data referenced), the current Spark + Trino read patterns, and 6 months of schema-change tickets. Design the migration: per-table strategy (in-place rewrite for small tables vs. write-once-rewrite-incrementally for the 30 largest), catalog choice (Glue vs. Nessie vs. REST), and the cutover plan for live writes. Prototype on a 5TB subset (around 80 tables): migrate, run validation queries via Spark + Trino, measure read-perf delta vs. raw Parquet. Deliver migration scripts, a validation report, a 10-page runbook for the full 200TB, and a 4-page memo on operational ongoing costs.

CredentialBlockchain-anchored
ShareableLinkedIn-ready
LanguageEnglish
PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Migrate a 200TB Parquet data lake to Iceberg with no read regression and a runbook the data-platform team can execute table-by-table.

Earning criteria — what you'll demonstrate

  • Choose the right Iceberg catalog for a multi-engine read pattern
  • Design per-table migration strategies for varying scale
  • Validate that schema evolution + time travel work end-to-end
  • Produce a runbook the data platform team can execute table-by-table

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Career mappings coming soon.

One more thing

You can put a credential on your CV by Friday.

Migrate a 200TB Data Lake from Parquet to Iceberg | Ewance Challenge