Right-Size a Real-Time Recommendation Serving Cluster
Overview
What this challenge is about.
You receive 7 days of request-level telemetry (timestamp, latency, error code, pod) plus the existing Horizontal Pod Autoscaler (HPA) and node-group configs. Analyze traffic patterns and pod utilization. Propose a new autoscaling configuration (could be HPA tuning, Vertical Pod Autoscaler, KEDA-style event-driven, or a hybrid). Run a load test (Locust or k6) reproducing the peak pattern and measure latency. Deliver the analysis, the proposed configs, the load-test report, and a 2-page rollout plan.
The Brief
What you'll do, and what you'll demonstrate.
Cut off-peak serving cost by 30 percent on a real-time recommendation cluster without breaching the p99 latency SLO.
Earning criteria — what you'll demonstrate
- Analyze serving telemetry to find over-provisioning
- Choose an autoscaling strategy under latency-SLO constraints
- Run a load test that faithfully reproduces peak traffic
- Write a rollout plan with explicit rollback triggers
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
MLOps Engineer
Right-sizing a real-time serving cluster under latency constraints is the daily reality of MLOps engineers at any consumer-ML company.
This challenge sharpens
- model-serving
- autoscaling
- kubernetes
Data Engineer
Telemetry analysis and capacity planning bridge directly into the data-engineer's broader work on pipeline cost discipline.
This challenge sharpens
- python
- cost-optimization
- model-serving
AI Solutions Architect
Designing autoscaling under SLO constraints is the architect's job when sizing real-time AI workloads for enterprise customers.
This challenge sharpens
- model-serving
- autoscaling
- cost-optimization