Overview
What this challenge is about.
Pick the priority product (recommend the customer-service RAG assistant, around 40k queries/day). Define monitoring signals: input drift (Evidently/NannyML), output quality (LLM-as-judge sampling, refusal rate), cost (tokens per query), and latency. Stand up the collection pipeline (Kafka or a SaaS like Arize/WhyLabs is fine), the storage, and the Grafana/Arize dashboard. Set up two alerts: one slow (drift over 7 days), one fast (cost spike in under 1 hour). Write a 5-page playbook the team uses to onboard the other 5 products.
The Brief
What you'll do, and what you'll demonstrate.
Stand up an end-to-end monitoring stack for one LLM-backed product and write the playbook to onboard five more.
Earning criteria — what you'll demonstrate
- Design monitoring for both classical ML drift and LLM-specific quality
- Implement an end-to-end collection-to-dashboard pipeline
- Set up alerts that page the right people without crying wolf
- Write a playbook that scales monitoring across a product portfolio
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
MLOps Engineer
Owning monitoring for LLM-backed products from drift to cost is the MLOps platform role that enterprise AI teams urgently need post-incident.
This challenge sharpens
- model-monitoring
- data-drift-detection
- observability
AI Engineer
Wiring LLM-as-judge sampling and refusal-rate metrics is core AI-engineer work at any team shipping production LLM features.
This challenge sharpens
- llm-evaluation
- model-monitoring
- alerting
AI Solutions Architect
Designing the monitoring stack and writing the cross-portfolio playbook is the architectural work AI solutions architects own at enterprise customers.
This challenge sharpens
- model-monitoring
- observability
- grafana