Overview
What this challenge is about.
Design the observability architecture: OpenTelemetry traces from 38 services into Tempo, structured logs via Loki, RED (rate, errors, duration) metrics via Prometheus, SLOs defined per service with burn-rate alerting. Prototype on 3 representative services (a high-volume sync API, an async worker, a stateful service). Define SLOs and burn-rate alerts on each. Produce a 12-page architecture document, an instrumentation playbook for the other 35 services, and an alerting policy that doesn't page during single-burst noise.
The Brief
What you'll do, and what you'll demonstrate.
Redesign observability for 38 microservices around OpenTelemetry tracing, RED metrics, and SLO-driven burn-rate alerting; prototype on 3 services and document rollout for the rest.
Earning criteria — what you'll demonstrate
- Instrument services with OpenTelemetry without coupling to one backend
- Define SLOs that reflect user-visible reliability, not infra-internal metrics
- Configure burn-rate alerting that avoids paging on transient noise
- Write an instrumentation playbook teams will actually follow
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Career mappings coming soon.