Run a Pre-Deployment Fairness + Drift Audit on a Hiring Model
Overview
What this challenge is about.
You receive a trained classifier (joblib), the training data sample, and a held-out 'next-month' evaluation set. Compute group fairness metrics (false-positive-rate gap, true-positive-rate gap, calibration-by-group) across two declared protected attributes. Quantify input-distribution drift between train and held-out using PSI (Population Stability Index) and KL divergence on key features. Propose at least one mitigation for the largest fairness gap. Then draft a drift-monitoring plan: which features to watch, thresholds, alerting cadence, and escalation path. Wrap with a 1-page exec summary.
The Brief
What you'll do, and what you'll demonstrate.
Run a defensible pre-deployment fairness and drift audit with a six-month monitoring plan a non-technical counsel can sign off on.
Earning criteria — what you'll demonstrate
- Compute and interpret group-fairness metrics on a real classifier
- Apply drift-detection methods (PSI, KL) on tabular features
- Design a model-monitoring plan that maps thresholds to escalations
- Communicate audit findings to a legal/executive audience
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
AI Safety Researcher
Pre-deployment audits paired with monitoring plans are the entry-level deliverable for AI safety researchers at consultancies and in-house responsible-AI teams.
This challenge sharpens
- fairness-metrics
- drift-detection
- bias-mitigation
MLOps Engineer
Designing the drift-monitoring plan that ops teams will run for the next 6 months is core MLOps work.
This challenge sharpens
- drift-detection
- model-monitoring
- python
Applied AI Scientist
Communicating model risk to executives in their language is exactly what applied AI scientists are evaluated on in interviews.
This challenge sharpens
- model-evaluation
- fairness-metrics
- bias-mitigation