Evaluate VAEs vs. Diffusion for Synthetic Tabular-Data Generation
Overview
What this challenge is about.
You receive a real labeled dataset (around 18,000 anonymized patient records, 32 features, binary outcome) and the team's existing VAE baseline. Train a tabular diffusion model (TabDDPM or similar) on the same data. Evaluate on fidelity (per-column marginal + pairwise correlation similarity), utility (downstream classifier AUC trained on synthetic / tested on real), and privacy (membership-inference attack success rate). Recommend which generator to ship and write a 3-page privacy + recommendation memo for the head of platform.
The Brief
What you'll do, and what you'll demonstrate.
Compare a tabular diffusion model with a VAE baseline on synthetic patient-record generation across fidelity, utility, and privacy.
Earning criteria — what you'll demonstrate
- Train tabular diffusion and VAE generators on real data
- Evaluate synthetic data across fidelity, utility, and privacy
- Run a basic membership-inference attack as privacy evaluation
- Communicate privacy trade-offs to platform leadership
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Research Scientist
Running a tabular-generator comparison with privacy + utility + fidelity evaluation is exactly the day-one work of a research scientist at any healthtech or privacy-AI team.
This challenge sharpens
- tabular-diffusion
- vae
- synthetic-data
AI Safety Researcher
Implementing a membership-inference attack as part of privacy evaluation is core AI safety work in regulated-data settings.
This challenge sharpens
- privacy-evaluation
- evaluation
- synthetic-data
Data Scientist
Comparing two generators on real downstream utility transfers directly to data-science roles where synthetic data unblocks collaboration.
This challenge sharpens
- evaluation
- vae
- pytorch