Debug a Race Condition in a Real-Time Collaboration Service
Overview
What this challenge is about.
Build a deterministic load-test harness (k6 or Playwright) that reproduces the corruption within 3 attempts on a clean local stack. Capture traces with OpenTelemetry to localize the race. Write a 4-page root-cause write-up walking through the timeline, the failed invariant, and the fix design. Ship the fix as a single reviewable PR with a regression test that fails on main. Author a 2-page operator runbook covering detection, immediate mitigation, and rollback. Deliver harness, write-up, PR, and runbook.
The Brief
What you'll do, and what you'll demonstrate.
Reproduce, root-cause, and fix a 0.4 percent multi-user race condition in a real-time collaboration service, and leave the on-call team better equipped.
Earning criteria — what you'll demonstrate
- Reproduce an intermittent concurrency bug deterministically
- Use distributed tracing to localize a race condition
- Communicate a root cause in writing that non-authors can follow
- Translate a bug fix into an operator runbook
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Career mappings coming soon.