Overview
What this challenge is about.
You receive 1,500 labeled shelf photos (anonymized product crops, bounding boxes, and ~12 relation types). Build a pipeline that, for a new shelf photo, outputs (a) detected products, (b) a scene graph of spatial + stock relations, and (c) a short structured insight summary ('Brand X has lost 2 facings vs the planogram'). Use an off-the-shelf detector (YOLOv8 or DETR) and a lightweight relation predictor on top. Evaluate recall@K of the predicted relations against the labels and qualitatively review on a 50-photo holdout. Produce a written analysis: which of the 4 brand-team questions (planogram drift, share of shelf, adjacency to competitors, out-of-stock) does the scene-graph approach answer better than the current detector-only baseline?
The Brief
What you'll do, and what you'll demonstrate.
Build a scene-graph generation pipeline for shelf photos and prove (or disprove) that scene graphs unlock insights a detector-only baseline cannot.
Earning criteria — what you'll demonstrate
- Implement a scene-graph generation pipeline combining detection and relation prediction
- Apply scene-understanding metrics (recall@K of relations) honestly
- Reason about when a richer representation actually unlocks downstream value
- Communicate computer-vision results to a product audience without jargon
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Computer Vision Engineer
Shipping a scene-graph pipeline beyond plain detection is exactly the next-step work CV engineers do at any analytics company moving from 'what's in the photo' to 'what's the story'.
This challenge sharpens
- scene-graph-generation
- object-detection
- relation-prediction
Applied AI Scientist
Comparing a richer representation against a baseline on real downstream questions is the day-to-day applied AI scientist's job for any product team weighing model upgrades.
This challenge sharpens
- scene-understanding
- evaluation
- relation-prediction
Machine Learning Engineer
Building a detector-plus-head pipeline and operating it on a labeled holdout with reproducible reporting maps directly to MLE work in computer-vision-heavy products.
This challenge sharpens
- object-detection
- pytorch
- evaluation