Overview
What this challenge is about.
Design the topology: model artifact storage, regional inference fleets (Triton, vLLM, or BentoML), traffic router, observability stack (Prometheus + Grafana). Pick a rollout strategy (blue/green, canary, shadow) and justify against the SLA. Prototype the smallest end-to-end version using a public model (e.g., DistilBERT) on two cheap regions; demonstrate p99 latency, recovery time on a forced region failure, and observability dashboards that a SRE would accept. Write a 5-page design doc for the platform architect.
The Brief
What you'll do, and what you'll demonstrate.
Design and prototype a multi-region, SLA-compliant online inference service with verified failover behavior.
Earning criteria — what you'll demonstrate
- Design an SLA-driven inference topology across regions
- Apply blue/green, canary, and shadow rollout patterns correctly
- Stand up production-grade observability for ML serving
- Defend a topology choice in writing to a platform architect
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
AI Solutions Architect
Designing multi-region inference topologies against hard SLAs is exactly the work AI solutions architects own at fintech and enterprise customers.
This challenge sharpens
- multi-region-deployment
- inference-serving
- sla-engineering
MLOps Engineer
Standing up observability and rollout strategies for ML serving is MLOps day-job, and this challenge gives the student a deployment story to point at.
This challenge sharpens
- inference-serving
- observability
- kubernetes
Machine Learning Engineer
MLEs increasingly own serving topology in cross-functional pods; this challenge bridges modeling skills into the operational reality.
This challenge sharpens
- inference-serving
- load-balancing
- observability