Investigate Why Our Generative Model Memorizes Training Data
Overview
What this challenge is about.
Pick a small open-source diffusion model (e.g., a Stable-Diffusion-class community model trained on LAION-subset). Reproduce a published membership-inference + extraction probe (Carlini-style) on it. Then test 3 mitigations: (a) text-prompt deduplication, (b) higher classifier-free guidance, (c) lightweight differential-privacy fine-tuning on a small calibration set. Report extraction-success rate before/after each mitigation, with documented assumptions. Write a 4-page memo aimed at a non-research policy reader explaining what was found, what wasn't, and what the team should do next.
The Brief
What you'll do, and what you'll demonstrate.
Quantify how much training-data memorization a small open diffusion model exhibits and how well standard mitigations work.
Earning criteria — what you'll demonstrate
- Reproduce a published safety result on a real model
- Reason about the assumptions baked into extraction-attack methodologies
- Evaluate the cost/benefit of common memorization mitigations
- Communicate safety findings to a non-research policy audience
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
ML Researcher
Threat-model articulation and assumption-tracking is the discipline that separates a citeable ML research project from a vibes-based one.
This challenge sharpens
- evaluation
- generative-models
- safety-research