Design SLO-Driven Alerts for a Telco's Subscriber API

FreeVerified credential4 weeksAdvanced

Overview

What this challenge is about.

Receive a 90-day RED (Rate, Errors, Duration) metrics export for the subscriber API across 6 endpoints and 38 weeks of paging history. Define an SLO per endpoint (e.g., 99.9 percent availability over 30 days, p95 latency under 220ms). Compute error budgets and implement multi-window, multi-burn-rate alerts (Google SRE workbook chapter 5 patterns). Author a Prometheus alert rule library and Alertmanager routing tree, with severity tiers tied to burn-rate speed. Deliver: 12-page SLO catalog, alert rule library (YAML), 8-page on-call runbook, and a 4-week evaluation with paging-volume and false-positive rates before/after.

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Redesign alerting around SLOs and multi-window burn-rate alerts to cut non-actionable pages in half without missing real incidents.

Earning criteria — what you'll demonstrate

Define SLOs grounded in user-experience signals, not infrastructure metrics
Compute error budgets and burn-rate alerts at multiple windows
Author Prometheus + Alertmanager rule libraries that survive review
Evaluate alerting changes against actual paging history honestly

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Software Observability

Master · Cs Se

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Career paths this builds toward

Canonical roles

Site Reliability Engineer
Platform & Infrastructure

One more thing

You can put a credential on your CV by Friday.

Start this challenge