Red-Team an Image-Classification Pipeline for a Banking KYC Workflow
Overview
What this challenge is about.
You receive the production image classifier as a black-box API plus a labeled validation set of 5,000 ID images. Run untargeted FGSM and PGD attacks (L_inf budget 4/255 and 8/255) and a transfer attack from a surrogate model. Quantify robust accuracy and identify the top 3 attack patterns the model is most vulnerable to. Deliver the attack notebooks, a robust-accuracy table, a failure-mode gallery, and a 4-page risk memo with three concrete remediation recommendations.
The Brief
What you'll do, and what you'll demonstrate.
Quantify the production KYC image classifier's robustness to standard adversarial attacks and recommend three remediations.
Earning criteria — what you'll demonstrate
- Implement and apply standard adversarial attacks to a real classifier
- Quantify robust accuracy at standard threat-model budgets
- Identify failure patterns and translate them into remediation plans
- Communicate adversarial-robustness findings to a non-research risk committee
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
AI Safety Researcher
Red-teaming a production classifier with rigorous robust-accuracy measurement and a remediation memo is the textbook AI safety researcher's project at any regulated AI deployment.
This challenge sharpens
- adversarial-attacks
- robust-evaluation
- red-teaming
ML Researcher
Implementing and benchmarking adversarial attacks against a real model with publication-grade rigor is core ML-researcher craft.
This challenge sharpens
- adversarial-attacks
- robust-evaluation
- pytorch
Applied AI Scientist
Translating attack findings into a remediation plan that engineering can ship is the applied-AI scientist's daily work in regulated industries.
This challenge sharpens
- red-teaming
- risk-assessment
- robust-evaluation