Explore the Bias-Variance Trade-off on a Tabular Healthcare Cohort
Overview
What this challenge is about.
You receive a 90,000-patient anonymized de-identified tabular dataset (demographics, labs, claims-derived features) and a binary 12-month-readmission outcome. Pick three model families (regularized logistic regression, random forest, gradient boosting). For each, produce learning curves (training-set-size axis) and validation curves (key hyperparameter axis), and decompose the test error into bias, variance, and noise (via repeated bootstrap on a smaller subsample). Synthesize a 3-page methodology note explaining what the curves imply for the choice of next model.
The Brief
What you'll do, and what you'll demonstrate.
Use learning and validation curves to empirically characterize the bias-variance trade-off across three model families on a real healthcare cohort.
Earning criteria — what you'll demonstrate
- Generate and interpret learning curves and validation curves
- Decompose test error into bias, variance, and irreducible noise empirically
- Connect the bias-variance trade-off to concrete next-step model choices
- Defend modelling choices in writing for a clinically literate audience
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Career paths this builds toward
Canonical rolesML Researcher
Empirical bias-variance studies tied to real datasets and real next-step decisions are exactly the deliverable that ML research hiring loops grade.
This challenge sharpens
- bias-variance-tradeoff
- learning-curves
- model-selection
Applied AI Scientist
Defending modelling choices to clinically literate stakeholders is core applied-AI-scientist craft at any healthtech company.
This challenge sharpens
- regularization
- model-selection
- bias-variance-tradeoff
Data Scientist
Producing principled validation evidence (not vibes) for modelling decisions is the senior data-scientist's responsibility on any regulated team.
This challenge sharpens
- regularization
- ensemble-methods
- model-selection