Audit a Production Model for Membership Inference Attacks
Overview
What this challenge is about.
Use a black-box membership inference attack (e.g., the LiRA or shadow-model attack). You have query access to a sandboxed copy of the model + the original training data labels for the attack-training side. Quantify the attack's success: ROC-AUC, true positive rate at false positive rate 0.1%. Stratify by subgroup (e.g., applicants near decision boundaries — they often leak more). Propose three mitigations (output truncation, prediction smoothing, training with DP) with effort estimates. Write a 5-page assessment for the compliance team.
The Brief
What you'll do, and what you'll demonstrate.
Quantify membership-inference leakage from a credit-decisioning model and rank mitigations by cost for compliance review.
Earning criteria — what you'll demonstrate
- Implement state-of-the-art membership-inference attacks
- Quantify leakage with proper metrics (TPR @ low FPR, not just AUC)
- Audit a model for privacy risk across subgroups
- Communicate privacy risk to a compliance audience
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
AI Safety Researcher
Running real privacy attacks on production models and writing the compliance assessment is the AI safety research work that EU-AI-Act-bound teams urgently need.
This challenge sharpens
- membership-inference
- privacy-attacks
- risk-assessment
ML Researcher
Privacy-attack research is a growing ML-research subfield with direct relevance to deployed-AI teams.
This challenge sharpens
- membership-inference
- shadow-models
- model-evaluation
AI Solutions Architect
Translating privacy risk into ranked mitigation proposals is the architectural work AI solutions architects do at regulated customers.
This challenge sharpens
- risk-assessment
- privacy-attacks
- membership-inference