Computer Science

Site Reliability & Observability Challenges

Site Reliability & Observability challenges put you on the hook for keeping production healthy. You'll build the fundamentals — Application Monitoring, Dashboard Reading, and Performance Analysis — and instrument services with OpenTelemetry instrumentation, Prometheus & Grafana, then define what "healthy" means through Service Level Objectives and SLO / SLI definition.

From there you'll handle the harder edges — Incident command, On-call runbooks, Multi-region failover, and Chaos engineering — the way reliability teams actually operate under pressure. Each challenge you solve earns a verified credential you can share with recruiters.

Recommended Challenges

· Incident command Clear

PresentationBeginnerNew
Run an Incident-Response Tabletop for a Healthtech On-Call Team
Design 3 tabletop scenarios with realistic timeline injects (every 5-10 minutes, new info arrives). Run the tabletop hybrid (in-person + remote) with the 8 on-call engineers + 2…
- Incident Response
- Tabletop Exercises
- Incident Command System
Open coursework

How it works

From brief to credential, in six steps.

Step 01
Browse challenges aligned to your studies.
Step 02
Accept the one that fits your goals.
Step 03
Work through it with AI Copilot guidance.
Step 04
Submit for structured evaluation.
Step 05
Earn a verified credential.
Step 06
Add it to LinkedIn with one click.

Related skill families

Browse all skills

Industry teams behind a decade of practitioner briefs

Hiring from this pool?

Sponsor a challenge and meet candidates through actual work.

Industry teams can shape briefs around the skills they hire for, then evaluate students on rubric-scored deliverables — not resumes.

Explore sponsorship