Skip to contentSkip to content
Verified credentials. On-chain. Forever.Learn more
Cover image for Variational Autoencoder for Synthetic Tabular Banking Data
Code

Variational Autoencoder for Synthetic Tabular Banking Data

FreeVerified credential2 weeksAdvanced

Overview

What this challenge is about.

You receive a 500K-row anonymized transaction dataset with 25 columns (mixed numerical + categorical). Train a VAE (TabVAE or a small custom model) with appropriate likelihoods per column type. Generate a 500K-row synthetic dataset. Evaluate utility via the 'train-on-synthetic, test-on-real' (TSTR) accuracy of a downstream gradient-boosted classifier predicting a held-out fraud label. Evaluate privacy via Membership Inference attack AUC and nearest-neighbor distance ratio. Compare to the histogram baseline on both axes and recommend in a 2-page report whether the VAE is good enough to ship to partners.

CredentialBlockchain-anchored
ShareableLinkedIn-ready
LanguageEnglish
PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Train a VAE on banking transactions and demonstrate that it generates synthetic data that is more useful and at least as private as a histogram baseline.

Earning criteria — what you'll demonstrate

  • Build and train a VAE with per-column likelihoods on mixed-type tabular data
  • Apply utility metrics (TSTR) and privacy metrics (MIA) to evaluate synthetic data
  • Reason about the privacy/utility trade-off in generative models
  • Communicate generative-model results to a non-ML data-sharing committee

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

ML Researcher

Designing a privacy-aware generative model with rigorous utility/privacy evaluation is the kind of project that opens doors at applied-research teams in finance, health, and government.

This challenge sharpens

  • variational-inference
  • deep-generative-models
  • synthetic-data

Applied AI Scientist

Trading off privacy and utility on real banking data is the day-to-day reality of applied AI scientists at regulated startups.

This challenge sharpens

  • deep-generative-models
  • synthetic-data
  • privacy-evaluation

Machine Learning Engineer

Productionizing a VAE training + evaluation pipeline that another engineer can rerun is core MLE craft.

This challenge sharpens

  • pytorch
  • tabular-data
  • synthetic-data

One more thing

You can put a credential on your CV by Friday.