Build a Fairness Evaluation Harness for a Credit-Score Model
Overview
What this challenge is about.
Implement a Python module that, given model predictions, ground truth, and group identifiers, computes demographic parity difference, equal-opportunity difference, predictive-parity difference, and false-positive-rate parity. Add bootstrap confidence intervals on each metric. Run on a synthetic credit-decision dataset (around 50,000 rows) with two intersecting group attributes. Produce a 4-page evaluation report with the metric tables, plus a 1-page methodology note explaining why each metric is reported and where it can mislead. Add unit tests covering edge cases (zero-positive groups, tiny groups).
The Brief
What you'll do, and what you'll demonstrate.
Build a reusable fairness evaluation harness with multiple group metrics and bootstrap intervals, plus a release-ready evaluation report.
Earning criteria — what you'll demonstrate
- Implement multiple group-fairness metrics from first principles
- Apply bootstrap methods for honest confidence intervals
- Reason about intersecting protected attributes
- Communicate fairness results to a risk-team audience
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Data Scientist
Shipping a reusable fairness harness with proper statistics is the data scientist's contribution to any regulated lending model release.
This challenge sharpens
- algorithmic-fairness
- statistical-evaluation
- model-evaluation
Machine Learning Engineer
Test-driven, edge-case-aware code is the MLE's craft when productionizing evaluation infra.
This challenge sharpens
- python
- test-driven-development
- model-evaluation
AI Safety Researcher
Group fairness and intersectional analysis sit squarely in the safety researcher's responsible-AI portfolio.
This challenge sharpens
- algorithmic-fairness
- statistical-evaluation
- bootstrap-methods