Debug Latency Tail With Distributed Tracing on a Logistics SaaS
Overview
What this challenge is about.
Receive 7 days of anonymized trace data in Tempo, the service map (12 services), and the customer complaint log. Investigate: filter the slowest 1 percent of traces, identify the top 5 spans that contribute to tail latency, correlate with infrastructure metrics (CPU throttling, GC pauses, DB connection saturation), and produce a hypothesis-driven diagnosis. Prioritize 3 remediations by expected p99 impact and effort. Implement (or simulate) one remediation and re-measure with traces. Deliver: 12-page investigation report with embedded TraceQL queries, remediation roadmap, and a 5-page follow-up measurement report.
The Brief
What you'll do, and what you'll demonstrate.
Diagnose p99 tail latency on a route-optimize endpoint using distributed traces and validate at least one remediation with measurement.
Earning criteria — what you'll demonstrate
- Filter and explore traces to isolate tail-latency contributors
- Correlate trace spans with infrastructure-level signals
- Translate trace evidence into remediation hypotheses
- Validate fixes with follow-up trace measurements
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Career mappings coming soon.