Define SLOs and Error Budgets for a Real-Time Trading API
Overview
What this challenge is about.
Pull 90 days of API latency + error data per endpoint from Prometheus (anonymized exports provided). Propose Service Level Indicators (SLIs) for 3 services × 2 SLI types (availability + latency). Set candidate SLOs (e.g., 99.95 percent availability over 30 days, p99 latency under 80 ms over 30 days). Simulate: what would the error budget have looked like over the past 90 days under each candidate? Tune until the SLOs are achievable but not trivial (around 30 percent error-budget consumption in a normal month). Deliver an SLO catalog (PDF), error-budget policy (when to pause feature work), and a 1-page exec summary.
The Brief
What you'll do, and what you'll demonstrate.
Define SLOs and an error-budget policy for the trading API, validated against 90 days of historical data and signed off by engineering + product.
Earning criteria — what you'll demonstrate
- Distinguish SLI from SLO and SLA in a real production context
- Choose SLIs that reflect user-perceived experience, not just backend metrics
- Calibrate SLOs against historical data so they are achievable but meaningful
- Write an error-budget policy engineering and product will both honor
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Career mappings coming soon.