Overview
What this challenge is about.
Use the public Last.fm-360k or similar dataset (anonymized listening histories) as a stand-in. Implement a baseline matrix-factorization recommender, then a hybrid that adds track-level content features (tempo, key, genre embeddings via Spotify-style audio features you can stub). Evaluate with NDCG@10 (Normalized Discounted Cumulative Gain, a ranking metric) overall AND for the cold-start slice (users with under 20 plays). Write a 3-page memo recommending whether to A/B test the hybrid in production.
The Brief
What you'll do, and what you'll demonstrate.
Decide whether a content-collaborative hybrid recommender beats the current collaborative baseline on cold-start users, with evidence the product team can act on.
Earning criteria — what you'll demonstrate
- Implement and tune matrix-factorization and hybrid recommenders
- Slice evaluation metrics by user segment to find where models actually differ
- Estimate sample sizes for an online A/B test before recommending one
- Trade off accuracy gains against latency and memory cost
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Machine Learning Engineer
Recommender bake-offs with cold-start slices and an A/B recommendation are exactly what MLEs ship every quarter at consumer-AI companies.
This challenge sharpens
- recommender-systems
- model-evaluation
- ml-pipelines
Data Scientist
Sample-size estimation and slice-based evaluation are core data-science skills any product team values.
This challenge sharpens
- ab-test-design
- model-evaluation
- feature-engineering
Applied AI Scientist
Turning model comparison into a costed product recommendation is the applied AI scientist's daily output.
This challenge sharpens
- recommender-systems
- feature-engineering
- ab-test-design