Open-Vocabulary Segmentation Benchmark for a Robotics R&D Lab
Overview
What this challenge is about.
Use a curated 200-image household scene set (publicly-available HM3D renderings or COCO + a handful of household prompts). Benchmark 3 open-vocabulary segmentation models: SAM + a CLIP-classifier head, OWLv2-segment, and a small open-source variant. Evaluate mean Intersection-over-Union (mIoU) per-prompt and inference latency on a single A10 GPU. Deliver a benchmark harness, results table, and a 4-page research memo for the head of robotics research.
The Brief
What you'll do, and what you'll demonstrate.
Pick the best open-vocabulary segmentation model for household robotics based on per-prompt mIoU, latency, and memory.
Earning criteria — what you'll demonstrate
- Design a fair benchmark across 3 distinct model families
- Evaluate open-vocabulary segmentation with prompt-level granularity
- Profile vision-language models for cost + memory
- Write a research memo that frames a multi-quarter investment
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
ML Researcher
Rigorous open-vocabulary segmentation benchmarks are exactly the experimental work ML researchers ship at robotics + foundation-model labs.
This challenge sharpens
- open-vocabulary-segmentation
- benchmarking
- experiment-design
Research Scientist
Honest cross-family comparison with per-prompt analysis is the research-scientist discipline that foundation-model labs hire for.
This challenge sharpens
- vision-language-models
- benchmarking
- evaluation
Computer Vision Engineer
Foundation-model segmentation work increasingly defines what CV engineers ship; this benchmark builds the judgment to pick the right tool.
This challenge sharpens
- open-vocabulary-segmentation
- vision-language-models
- evaluation