Overview
What this challenge is about.
You receive 60 days of anonymized impression/click logs covering around 200 content items and user features (cohort, listening history bucket). Build a contextual-bandit simulator with off-policy evaluation via inverse propensity scoring (IPS). Implement and compare epsilon-greedy, Thompson sampling (with a Beta-Bernoulli per-arm prior, optionally extended to a logistic model), and UCB1 on the offline log. Score on (a) IPS-estimated reward, (b) coverage of long-tail content, and (c) per-cohort fairness (no cohort starves). Recommend one strategy for a 4-week live A/B with rationale.
The Brief
What you'll do, and what you'll demonstrate.
Offline-evaluate three exploration strategies for a meditation-app recommender and recommend one for the next live A/B.
Earning criteria — what you'll demonstrate
- Implement epsilon-greedy, Thompson sampling, and UCB1 from scratch
- Apply inverse propensity scoring for off-policy evaluation
- Reason about exploration-exploitation trade-offs on real production logs
- Translate offline-evaluation results into an A/B test design
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Career paths this builds toward
Canonical rolesData Scientist
Offline-evaluating exploration strategies on a real recommender log is the day-one job of growth-leaning data scientists at consumer-AI startups.
This challenge sharpens
- contextual-bandits
- off-policy-evaluation
- exploration
Machine Learning Engineer
Implementing and testing three exploration strategies and shipping the winner to a live A/B is core MLE work in recommender teams.
This challenge sharpens
- thompson-sampling
- ucb
- python
Applied AI Scientist
Trading off exploration, fairness, and long-tail coverage is the kind of judgement applied AI scientists bring to ranking and recommendation problems.
This challenge sharpens
- contextual-bandits
- exploration
- off-policy-evaluation