Overview
What this challenge is about.
You receive (1) a vendor-supplied multi-label chest-X-ray classifier, (2) the current single-site held-out evaluation set, (3) a 12,000-image multi-site evaluation set with 14-finding labels per image. Compute per-finding AUROC + AUPRC, ECE, and per-site drift on key input statistics (image-quality features). Identify the worst per-site / per-finding cells. Propose two mitigations (site-specific recalibration, focused re-training). Wrap into a 6-page deployment plan with go/no-go decision per site.
The Brief
What you'll do, and what you'll demonstrate.
Run a per-site, per-finding audit of a chest-X-ray classifier and produce a go/no-go deployment plan for each of 5 sites.
Earning criteria — what you'll demonstrate
- Apply rigorous multi-site evaluation to a clinical imaging classifier
- Detect input distribution drift across hospital sites
- Propose mitigations grounded in real per-site / per-finding evidence
- Communicate per-site deployment decisions to a clinical-AI committee
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
AI Safety Researcher
Per-site clinical-imaging audits with per-finding evaluation and mitigation proposals are the AI-safety-researcher's signature deliverable at consultancies serving healthcare networks.
This challenge sharpens
- drift-detection
- model-monitoring
- model-calibration
MLOps Engineer
Per-site drift monitoring and recalibration plans are core MLOps work for any multi-site clinical-AI deployment.
This challenge sharpens
- drift-detection
- model-monitoring
- model-evaluation
Applied AI Scientist
Bridging per-site audit evidence to a committee-readable go/no-go plan is the applied-AI-scientist's daily craft in clinical AI consulting.
This challenge sharpens
- medical-imaging
- classification
- model-evaluation