Overview
What this challenge is about.
You receive about 12,000 short (1-3s) webcam clips covering a 50-word vocabulary, with body+hand pose features pre-extracted (e.g., MediaPipe Holistic landmarks per frame). Build a sequence model (Transformer encoder or BiLSTM over pose features) for isolated-word classification. Compare against a frame-averaging baseline. Report top-1 and top-5 accuracy, plus a confusion matrix highlighting the most confusable word pairs. Deliver a 3-page evaluation memo plus a short demo screen recording of inference on 5 held-out clips.
The Brief
What you'll do, and what you'll demonstrate.
Train a sequence model that classifies isolated signs from pose features at production-grade accuracy.
Earning criteria — what you'll demonstrate
- Apply sequence models to short multivariate time series
- Use pose features as a compact intermediate representation for sign data
- Diagnose confusable classes in fine-grained classification
- Communicate results respectfully to a community-data-providing audience
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
ML Researcher
Training sequence models on a small, real, community-collected dataset is the kind of focused project ML-researcher hiring loops respect.
This challenge sharpens
- sequence-models
- transformer
- model-evaluation
Computer Vision Engineer
Pose-based perception with a sequence model is a transferable CV skill, especially for accessibility, fitness, and gaming products.
This challenge sharpens
- pose-estimation
- classification
- pytorch
Applied AI Scientist
Producing community-respectful evaluation artifacts (not just metrics) is what applied AI scientists ship in any accessibility-adjacent product.
This challenge sharpens
- sequence-models
- model-evaluation
- classification