Skip to contentSkip to content
Verified credentials. On-chain. Forever.Learn more
Cover image for Generate Synthetic Tabular Data with Privacy Guarantees
Code

Generate Synthetic Tabular Data with Privacy Guarantees

FreeVerified credential3 weeksAdvanced

Overview

What this challenge is about.

Implement DP synthetic data generation: either DP-CTGAN, PATE-GAN, or a marginal-based DP method like PrivBayes / MWEM. Train on the real dataset (around 200,000 transactions, 18 features) at epsilon = 4. Evaluate utility on three tasks: (1) marginal preservation (Kolmogorov-Smirnov test per feature), (2) downstream-classifier accuracy (train on synthetic, test on real), and (3) correlation preservation. Run a membership-inference attack on the generator to validate the privacy claim. Write a 4-page data-sharing brief plus the technical appendix the legal team attaches to the data-sharing agreement.

CredentialBlockchain-anchored
ShareableLinkedIn-ready
LanguageEnglish
PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Generate differentially-private synthetic transaction data with proven utility and a privacy claim that survives a membership-inference attack.

Earning criteria — what you'll demonstrate

  • Apply DP to generative models with proper privacy accounting
  • Evaluate synthetic data with multiple utility metrics
  • Validate privacy claims with empirical attacks
  • Communicate synthetic-data privacy to a legal audience

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

AI Safety Researcher

DP synthetic-data generation with empirical privacy validation is the AI safety work that fintechs and healthtechs need for safe data sharing.

This challenge sharpens

  • differential-privacy
  • synthetic-data
  • privacy-validation

ML Researcher

DP generative modeling is an active research area with direct industry application; this challenge gives the student a publishable-shape project.

This challenge sharpens

  • synthetic-data
  • generative-models
  • utility-evaluation

Data Scientist

Synthetic data is increasingly a data-scientist's tool for safe collaboration; this challenge teaches when synthetic data is honest and when it is not.

This challenge sharpens

  • synthetic-data
  • utility-evaluation
  • generative-models

One more thing

You can put a credential on your CV by Friday.