Lab Project: Compare Three Architectures on Your Own Mini-Benchmark

FreeVerified credential4 weeksAdvanced

Overview

What this challenge is about.

Scope the problem yourself (suggested examples: sentiment classification on a niche domain, tabular anomaly detection, time-series forecasting on a public dataset). Define the train/val/test split AND a held-out distribution-shift evaluation. Implement three architectures from different families (e.g., MLP + transformer + gradient-boosted trees) with shared hyperparameter budget and 5 random seeds each. Report mean + 95% confidence interval per metric, plus a paired statistical test between the top two. Write a 4-page lab report in NeurIPS-style format.

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Design and run a fair three-architecture mini-benchmark with honest statistical reporting and a written lab report.

Earning criteria — what you'll demonstrate

Design a fair benchmark across architecture families
Apply statistical testing to ML results (no single-seed claims)
Distinguish in-distribution from distribution-shift performance
Write a publication-style lab report

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

AI/ML Practicum and Hands-on Lab

Master · Ai Ml

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Career paths this builds toward

Canonical roles

Machine Learning Engineer
AI Engineering

ML Researcher

Designing fair benchmarks and reporting wins with confidence intervals is the daily hygiene of a junior ML researcher, especially at labs that take reproducibility seriously.

This challenge sharpens

experiment-design
statistical-testing
benchmarking

Research Scientist

Multi-seed runs, paired statistical tests, and workshop-style writing mirror the rigor expected from a research scientist's first ablation study.

This challenge sharpens

statistical-testing
scientific-writing
experiment-design

Applied AI Scientist

The discipline of distribution-shift evaluation translates directly to applied AI work where deployment data never matches training data.

This challenge sharpens

benchmarking
pytorch
deep-learning

One more thing

You can put a credential on your CV by Friday.

Start this challenge