Profile and Tame a P99-Latency Tail for an Ad-Auction Service
Overview
What this challenge is about.
Receive the bidder source (Go, around 22,000 lines), production traces (eBPF + flame graphs from 30 minutes of peak traffic), and the host config (NUMA-2 socket, 96 cores, 384GB RAM, Go 1.22). Use bcc-tools (offcputime, runqlat, biolatency, futex contention) to profile the p99-tail traffic specifically (filter for slow requests). Identify the 4 dominant contributors with evidence. Propose targeted fixes per contributor (GC tuning + GOMEMLIMIT, lock-stripping or sync.Pool, mlock for hot pages, NUMA-aware scheduling). Implement and validate each fix in isolation, then cumulatively. Run a 1-hour load test at peak QPS, confirm p99 < 25ms without p50 regression. Deliver the profiling artifacts, the per-fix experiment notes, the cumulative-validation report, and an 8-page write-up for the platform-team lead.
The Brief
What you'll do, and what you'll demonstrate.
Reduce a programmatic-ad-auction service's p99 from 78ms to under 25ms by attacking the top 4 tail-latency contributors identified through eBPF profiling.
Earning criteria — what you'll demonstrate
- Use bcc-tools + eBPF to profile tail-latency specifically
- Distinguish GC, lock-contention, page-fault, and NUMA-related tails
- Validate fixes in isolation to attribute speedup honestly
- Run a peak-load test that confirms p99 without p50 regression
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Career mappings coming soon.