Tune Autoscaling for a Cost-Sensitive Workload with HPA + KEDA
Overview
What this challenge is about.
Receive the service's current Deployment + HorizontalPodAutoscaler config (static 12-20 replicas), 90 days of traffic logs + Kafka-lag metrics, and the SLA (p99 < 250ms, error rate < 0.1 percent). Replace HPA with KEDA ScaledObjects driven by 3 triggers: CPU utilization (baseline), Kafka consumer-group lag (for the async recommendation path), and a custom Prometheus metric (queued-request count). Tune cooldownPeriod, pollingInterval, min/max replicas per scaler. Build a traffic-replay harness against the 90 days of logs in a staging cluster, simulating off-peak / peak / flash-sale phases. Measure cost (replica-hours) and SLA per phase. Deliver the ScaledObject manifests, the traffic-replay harness, the cost-and-SLA report, the rollout plan, and a 5-page write-up for the engineering manager.
The Brief
What you'll do, and what you'll demonstrate.
Replace static replicas with HPA + KEDA autoscaling driven by 3 triggers and prove 35-50 percent cost reduction without breaching SLA across off-peak / peak / flash-sale phases.
Earning criteria — what you'll demonstrate
- Combine HPA, KEDA, and Prometheus-based triggers correctly
- Tune scaler cooldown + polling to avoid thrashing
- Build a traffic-replay harness that exercises real phases
- Quantify cost vs. SLA trade-offs with evidence
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Career mappings coming soon.