Investigate Scaling Trends on a Small Open Benchmark

FreeVerified credential4 weeksExpert

Overview

What this challenge is about.

You will train 4 transformer language models (10M, 50M, 200M, 600M parameters) on a public pretraining corpus (e.g., a small subset of FineWeb or OpenWebText) under identical optimization hyperparameters scaled with Chinchilla-style compute-optimal ratios. Evaluate each model on a downstream benchmark (e.g., a HellaSwag subset or LAMBADA). Plot loss vs. parameters and downstream metric vs. parameters with confidence intervals. Deliver: training scripts, model checkpoints, plots, and a 5-page note interpreting the trend with explicit caveats about what does and doesn't transfer.

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Characterize the scaling trend of tiny transformers on a chosen downstream task with a clean, reproducible methodology.

Earning criteria — what you'll demonstrate

Train a family of transformers under compute-optimal hyperparameter scaling
Evaluate downstream task performance with confidence intervals
Apply scaling-laws-style analysis to a small open benchmark
Communicate scaling results with honest caveats about transfer

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Large Language Models

Master · Ai Ml

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Research Scientist

A clean small-scale scaling-laws reproduction is exactly the kind of artefact that lands research-scientist interviews at AI labs.

This challenge sharpens

scaling-laws
transformer-pretraining
compute-optimal-training

ML Researcher

Training a model family under controlled hyperparameters and reporting confidence intervals is the methodological core of ML research.

This challenge sharpens

transformer-pretraining
benchmark-design
reproducibility

Machine Learning Engineer

Building the reproducible training and evaluation harness is the MLE skillset that scaling-and-research teams hire for.

This challenge sharpens

pytorch
reproducibility
compute-optimal-training

One more thing

You can put a credential on your CV by Friday.

Start this challenge