Overview
What this challenge is about.
You will train 4 transformer language models (10M, 50M, 200M, 600M parameters) on a public pretraining corpus (e.g., a small subset of FineWeb or OpenWebText) under identical optimization hyperparameters scaled with Chinchilla-style compute-optimal ratios. Evaluate each model on a downstream benchmark (e.g., a HellaSwag subset or LAMBADA). Plot loss vs. parameters and downstream metric vs. parameters with confidence intervals. Deliver: training scripts, model checkpoints, plots, and a 5-page note interpreting the trend with explicit caveats about what does and doesn't transfer.
The Brief
What you'll do, and what you'll demonstrate.
Characterize the scaling trend of tiny transformers on a chosen downstream task with a clean, reproducible methodology.
Earning criteria — what you'll demonstrate
- Train a family of transformers under compute-optimal hyperparameter scaling
- Evaluate downstream task performance with confidence intervals
- Apply scaling-laws-style analysis to a small open benchmark
- Communicate scaling results with honest caveats about transfer
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Research Scientist
A clean small-scale scaling-laws reproduction is exactly the kind of artefact that lands research-scientist interviews at AI labs.
This challenge sharpens
- scaling-laws
- transformer-pretraining
- compute-optimal-training
ML Researcher
Training a model family under controlled hyperparameters and reporting confidence intervals is the methodological core of ML research.
This challenge sharpens
- transformer-pretraining
- benchmark-design
- reproducibility
Machine Learning Engineer
Building the reproducible training and evaluation harness is the MLE skillset that scaling-and-research teams hire for.
This challenge sharpens
- pytorch
- reproducibility
- compute-optimal-training