Pretrain a Small Vision Transformer with Self-Supervised Learning
Overview
What this challenge is about.
You receive 80,000 unlabeled 224x224 histology tiles plus 4,000 labeled tiles split into train/val/test. Pretrain a ViT-Small using a self-supervised method of your choice (DINOv2 recommended, MAE acceptable) for a fixed compute budget of 12 GPU-hours on a single A100 (or proportional time on a smaller GPU). Then fine-tune for tile-level classification and compare against two baselines: (a) the same ViT-Small trained from scratch on the 4,000 labels, (b) an ImageNet-pretrained ViT-Small fine-tuned on the 4,000 labels. Report accuracy + macro-F1 with bootstrap confidence intervals and write the recommendation memo.
The Brief
What you'll do, and what you'll demonstrate.
Determine whether self-supervised pretraining on unlabeled tiles meaningfully outperforms ImageNet pretraining for this team's downstream histology task.
Earning criteria — what you'll demonstrate
- Implement a modern self-supervised pretraining objective end-to-end
- Design a fair fine-tuning comparison under a fixed compute budget
- Quantify model performance with statistical rigor (bootstrap CIs)
- Translate research findings into actionable team-process recommendations
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
ML Researcher
Implementing self-supervised pretraining on a real downstream task and writing it up with statistical rigor is precisely the workload of a first-year ML researcher at an applied lab.
This challenge sharpens
- self-supervised-learning
- vision-transformers
- experiment-design