Build a Scalable-System-Design Spec for a Streaming-Ingest Pipeline
Overview
What this challenge is about.
Receive the current architecture (Kafka 3.6, Flink 1.18, ClickHouse 24.x), 4 weeks of production metrics (per-topic throughput, partition skew, Flink operator backpressure, ClickHouse merge load), and the 12-month customer-growth forecast. Build a capacity model: predicted events/sec by month, p99 latency budget per stage, partition counts needed, ClickHouse shard counts. Define the partitioning strategy (customer_id-based vs. composite-key) and the hot-key handling (sticky-key sub-partitioning). Define 4 operational SLOs (ingest lag, processing lag, query p99, lossy-on-failure rate). Prototype the bottleneck component (likely the Flink stateful operator under hot-key load), benchmark at 200k events/sec to validate the model. Deliver the architecture spec (15 pages), the capacity-model spreadsheet, the bottleneck prototype, the benchmark report, and a 2-page decision-readout for the CTO.
The Brief
What you'll do, and what you'll demonstrate.
Produce a scalable-system-design spec to grow a streaming-ingest pipeline from 80k to 1M events/sec, with a prototype that validates the capacity model on the bottleneck component.
Earning criteria — what you'll demonstrate
- Build a defensible capacity model for a streaming pipeline
- Pick partitioning and hot-key strategies under real skew
- Define operational SLOs that map to customer impact
- Validate a capacity model with a focused prototype, not a full rebuild
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Career mappings coming soon.