A/B-Test a Recommender Improvement Without Breaking Trust
Overview
What this challenge is about.
You receive offline-evaluation results for both the production and candidate models plus aggregate metrics from the last 12 weeks (recipe views, save rate, weekly active users, complaint rate, churn). Design the A/B test: hypothesis, primary metric, guardrail metrics, sample size for the desired minimum detectable effect, randomization unit, and stopping rule. Pre-register the analysis (what test, what correction). Then produce a fill-in-the-blanks post-test memo so the team can publish results consistently next time too.
The Brief
What you'll do, and what you'll demonstrate.
Design a trustworthy A/B test for a recommender upgrade with explicit guardrails and a pre-registered analysis plan.
Earning criteria — what you'll demonstrate
- Design a live A/B test with appropriate guardrails for ML deployments
- Compute required sample size for a target minimum detectable effect
- Pre-register an analysis plan to prevent post-hoc metric-hunting
- Translate offline ML metrics into product-grade success/guardrail metrics
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
AI Product Manager
Writing trustworthy test plans with guardrail metrics is the central AI-PM craft and is graded heavily in interviews at consumer-AI startups.
This challenge sharpens
- experiment-design
- metric-design
- guardrail-metrics
Data Scientist
Pre-registered analysis plans and sample-size discipline are exactly what hiring managers look for in data-scientist candidates joining experimentation platforms.
This challenge sharpens
- ab-testing
- statistical-analysis
- experiment-design
Applied AI Scientist
Bridging offline ML metrics to live product metrics with rigour is a defining applied-AI-scientist skill.
This challenge sharpens
- metric-design
- ml-problem-scoping
- experiment-design