Overview
What this challenge is about.
Use a tabletop simulator (PyBullet or Isaac Sim, both open) with 5 object types and 5 spatial relations (left of, right of, behind, in front of, on top of). Curate or generate around 300 utterance-scene-action triples for training and 50 for evaluation. Build a pipeline: scene perception (object detection) → grounded language parsing (utterance + scene → executable pick action) → simulator execution. Evaluate task success rate. Deliver a working demo, a 3-page methods note, and a short screencast.
The Brief
What you'll do, and what you'll demonstrate.
Ship a grounded-language pick-and-place demo with documented task-success rate on a controlled vocabulary of spatial relations.
Earning criteria — what you'll demonstrate
- Build a grounded-language pipeline (scene + utterance → action)
- Apply reference resolution and spatial-relation handling
- Evaluate language-to-action systems with task-success metrics
- Communicate a research demo to an investor audience
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
AI Engineer
Grounded-language demos that connect perception, parsing, and action are the AI-engineering work robotics startups hire for at every stage.
This challenge sharpens
- grounded-language-understanding
- perception
- simulation
ML Researcher
Spatial-relation evaluation and reference resolution are open research problems; this project gives a ML researcher a publication-track foundation.
This challenge sharpens
- grounded-language-understanding
- semantic-parsing
- evaluation
Applied AI Scientist
Bridging a research method into an investor-ready demo is the applied-AI craft that turns capability into capital.
This challenge sharpens
- semantic-parsing
- evaluation
- perception