Overview
What this challenge is about.
Implement HyperLogLog with precision parameter p in {12, 14, 16} (4KB, 16KB, 64KB sketches) and benchmark relative error on a replayed 3-hour production trace (around 13 billion bid events, around 220 million distinct users). Implement HLL merge (union across hourly buckets) to support 7-day rolling windows. Compare memory vs error vs Redis SET baseline. Deliver a Java reference implementation (the DSP runs on the JVM), the precision-vs-memory benchmark, a Redis migration plan, and operational dashboards.
The Brief
What you'll do, and what you'll demonstrate.
Replace exact-cardinality user-counting at 1.2M events/sec with HyperLogLog sketches at under 2 percent relative error and at most 64KB per (campaign, hour) bucket.
Earning criteria — what you'll demonstrate
- Implement HyperLogLog including bias correction for low cardinalities
- Compute and verify the standard relative error of 1.04 / sqrt(2^p)
- Design a merge operation for distributed cardinality across time buckets
- Trade memory vs error in a real streaming pipeline
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Career mappings coming soon.