Synthetic Data

If you like applying Synthetic Data, every challenge here gives you a chance to practice it on a real industry brief.

Recommended Challenges

· Intermediate only Clear

CodeIntermediateNew
Generate Synthetic Tabular Data with Privacy Guarantees
Implement DP synthetic data generation: either DP-CTGAN, PATE-GAN, or a marginal-based DP method like PrivBayes / MWEM. Train on the real dataset (around 200,000 transactions, 1…
- Synthetic Data
- Differential Privacy
- Generative Models
Privacy-Preserving Machine Learning
CodeIntermediateNew
Variational Autoencoder for Synthetic Tabular Banking Data
You receive a 500K-row anonymized transaction dataset with 25 columns (mixed numerical + categorical). Train a VAE (TabVAE or a small custom model) with appropriate likelihoods …
- Variational Inference
- Deep Generative Models
- Synthetic Data
Probabilistic Machine Learning
CodeIntermediateNew
Instruction-Tune a Small Model for an Edtech Tutor
You receive a 1.5B base model (e.g., SmolLM-1.7B or Qwen-1.8B), permission to use 2 hours of a rented A100, and a curated seed of around 5,000 math-tutoring dialogues. Augment w…
- Instruction Tuning
- Fine Tuning
- Dataset Curation
Fine-Tuning Large Language Models
ResearchIntermediateNew
Evaluate VAEs vs. Diffusion for Synthetic Tabular-Data Generation
You receive a real labeled dataset (around 18,000 anonymized patient records, 32 features, binary outcome) and the team's existing VAE baseline. Train a tabular diffusion model …
- Tabular Diffusion
- Vae
- Synthetic Data
Generative AI
Practice your coursework on real scenarios.
Every challenge is shaped from real-world context — not generic exercises. The work mirrors what your degree prepares you for.
Why Ewance
CodeIntermediateNew
Build a Domain Instruction-Tuning Recipe for a Legal Coach
You will source instruction data from three streams: ~3,000 synthetic paralegal Q&A generated by a frontier model (anonymized prompts), ~1,500 curated examples from public legal…
- Instruction Tuning
- Fine Tuning
- Data Curation
Large Language Models
CodeIntermediateNew
Train a VAE for Synthetic Tabular Data at a Healthtech Startup
You receive a synthetic-but-realistic clinical-trial table (around 50,000 patients, 35 columns, mixed continuous and categorical). Train a tabular VAE (or TVAE/CTGAN as alternat…
- Vae
- Tabular Generation
- Synthetic Data
Deep Generative Models

How it works

From brief to credential, in six steps.

Step 01
Browse challenges aligned to your studies.
Step 02
Accept the one that fits your goals.
Step 03
Work through it with AI Copilot guidance.
Step 04
Submit for structured evaluation.
Step 05
Earn a verified credential.
Step 06
Add it to LinkedIn with one click.

Industry teams behind a decade of practitioner briefs

Hiring from this pool?

Sponsor a challenge and meet candidates through actual work.

Industry teams can shape briefs around the skills they hire for, then evaluate students on rubric-scored deliverables — not resumes.

Explore sponsorship

Synthetic Data Challenges | Ewance