Design a Change-Data-Capture Pipeline for an E-Commerce Reseller
Overview
What this challenge is about.
Receive the MySQL schema (220 tables), 7 days of binlog samples, and the data team's freshness + correctness requirements. Design the CDC pipeline: Debezium for MySQL binlog capture, Kafka as the transport, a sink connector (Kafka Connect to ClickHouse or Snowflake equivalent) for the warehouse. Handle: schema changes, large transactions, snapshotting for the initial load, and ordering guarantees per table. Prototype on the 8 highest-traffic tables (around 60 percent of write volume). Replay a representative 24-hour load and measure end-to-end freshness, correctness (warehouse row-count matches source), and operator overhead. Deliver Docker-Compose stack, connector configs, the replay report, and a 6-page rollout plan for the remaining 212 tables.
The Brief
What you'll do, and what you'll demonstrate.
Ship a Debezium + Kafka CDC pipeline that mirrors MySQL to a warehouse with under-1-minute freshness on 8 priority tables, and a rollout plan for the remaining 212.
Earning criteria — what you'll demonstrate
- Design and operate a Debezium-based CDC pipeline end-to-end
- Handle schema changes, large transactions, and initial snapshotting
- Validate freshness + correctness with realistic replays
- Plan a 220-table rollout that doesn't melt the source database
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Career mappings coming soon.