Overview
What this challenge is about.
Pick the model (Video-LLaVA, VideoChat2, or LLaVA-Video) and justify on the A10G budget. Build a Streamlit demo: upload video, ask question, get answer with cited frame timestamps. Construct a 25-question eval set drawn from sample corporate-training videos (publicly available). Score answers as correct / partially correct / wrong on a 3-point scale by two graders. Report agreement and overall accuracy. Write a 3-page pitch deck the consultant lead uses with the prospective client.
The Brief
What you'll do, and what you'll demonstrate.
Ship a 5-day video-QA demo on a single A10G that lands a corporate-training prospect.
Earning criteria — what you'll demonstrate
- Pick an appropriate video-language model under a hardware budget
- Build a working multimodal demo on a tight timeline
- Run a small-but-honest eval with multiple graders
- Translate a working prototype into a client-facing pitch
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
AI Engineer
Building a working multimodal demo on a tight budget and turning it into a pitch is the AI-engineer skill set consultancies hire for repeatedly.
This challenge sharpens
- video-language-models
- streamlit
- demo-engineering
Applied AI Scientist
Choosing the right model under a hardware budget and validating with a two-grader eval is core applied-AI-scientist work.
This challenge sharpens
- video-language-models
- multimodal-fusion
- evaluation
AI Solutions Architect
Translating a working demo into a solutions pitch is the AI solutions architecture work consultancies sell to enterprise clients.
This challenge sharpens
- demo-engineering
- video-language-models
- evaluation