Skip to contentSkip to content
Verified credentials. On-chain. Forever.Learn more
Cover image for Build a Small Transformer from Scratch and Train It on Code
Code

Build a Small Transformer from Scratch and Train It on Code

FreeVerified credential4 weeksAdvanced

Overview

What this challenge is about.

Implement multi-head self-attention, RMSNorm, rotary positional embeddings, and a causal LM head from scratch — no Hugging Face shortcuts for the model code (you may use Hugging Face Tokenizers for BPE). Train on a small code corpus (subset of The Stack or a curated Python-only dump, around 800M tokens) on a single A100 for one GPU-day. Report training loss curves, evaluation perplexity on a held-out split, and one sample generation per epoch. Write a 4-page learnings note covering one attention-head visualization, one positional-encoding experiment, and one training-instability fix you encountered.

CredentialBlockchain-anchored
ShareableLinkedIn-ready
LanguageEnglish
PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Implement and train a 30M-parameter decoder-only transformer from scratch on a code corpus with proven attention + training understanding.

Earning criteria — what you'll demonstrate

  • Implement self-attention, RoPE, and an LM head from first principles
  • Train a transformer end-to-end on a non-toy corpus
  • Visualize and interpret attention patterns
  • Diagnose and fix training instabilities (loss spikes, NaNs)

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

ML Researcher

From-scratch transformer implementation is the canonical research-team initiation; this challenge gives the student exactly that portfolio piece.

This challenge sharpens

  • transformers
  • self-attention
  • language-modeling

Research Scientist

Implementing and ablating positional encodings is the kind of foundational work that research scientists do daily on architecture-research teams.

This challenge sharpens

  • self-attention
  • rope
  • training-debugging

Applied AI Scientist

Deep PyTorch fluency at the layer-implementation level translates directly into applied AI work where standard frameworks aren't enough.

This challenge sharpens

  • pytorch
  • transformers
  • training-debugging

One more thing

You can put a credential on your CV by Friday.