Evaluate VAEs vs. Diffusion for Synthetic Tabular-Data Generation

FreeVerified credential3 weeksAdvanced

Overview

What this challenge is about.

You receive a real labeled dataset (around 18,000 anonymized patient records, 32 features, binary outcome) and the team's existing VAE baseline. Train a tabular diffusion model (TabDDPM or similar) on the same data. Evaluate on fidelity (per-column marginal + pairwise correlation similarity), utility (downstream classifier AUC trained on synthetic / tested on real), and privacy (membership-inference attack success rate). Recommend which generator to ship and write a 3-page privacy + recommendation memo for the head of platform.

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Compare a tabular diffusion model with a VAE baseline on synthetic patient-record generation across fidelity, utility, and privacy.

Earning criteria — what you'll demonstrate

Train tabular diffusion and VAE generators on real data
Evaluate synthetic data across fidelity, utility, and privacy
Run a basic membership-inference attack as privacy evaluation
Communicate privacy trade-offs to platform leadership

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Generative AI

Master · Ai Ml

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Career paths this builds toward

Canonical roles

Research Scientist
AI Research

Research Scientist

Running a tabular-generator comparison with privacy + utility + fidelity evaluation is exactly the day-one work of a research scientist at any healthtech or privacy-AI team.

This challenge sharpens

tabular-diffusion
vae
synthetic-data

AI Safety Researcher

Implementing a membership-inference attack as part of privacy evaluation is core AI safety work in regulated-data settings.

This challenge sharpens

privacy-evaluation
evaluation
synthetic-data

Data Scientist

Comparing two generators on real downstream utility transfers directly to data-science roles where synthetic data unblocks collaboration.

This challenge sharpens

evaluation
vae
pytorch

One more thing

You can put a credential on your CV by Friday.

Start this challenge