Overview
What this challenge is about.
Read the HLL paper (Flajolet et al., 2007) and a sample of 30 minutes of customer trace IDs (around 80M events). Implement HLL from scratch in Go (no library imports). Run benchmarks at register-count = 2^10, 2^12, 2^14, 2^16 against the exact distinct count. Analyze the standard error empirically vs. the theoretical 1.04/sqrt(m) bound. Recommend a register count that achieves under 2 percent error at p95 within a 256KB-per-minute memory budget. Deliver Go code, a benchmark notebook, and a 4-page deployment memo.
The Brief
What you'll do, and what you'll demonstrate.
Implement and analyze HyperLogLog in Go, then recommend a register-count configuration that meets a 2 percent p95 error target inside a 256KB-per-minute memory budget.
Earning criteria — what you'll demonstrate
- Implement a non-trivial randomized algorithm from a paper
- Reason about the accuracy/memory trade-off of probabilistic sketches
- Validate theoretical error bounds empirically on real data
- Translate algorithmic choices into production memory budgets
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.