Visual Question Answering for a Pediatric Radiology Workflow
Overview
What this challenge is about.
You receive ~8,000 publicly available pediatric chest X-rays with structured findings labels (anonymized; no PHI access required). Build a VQA pipeline that maps a (image, question) pair to a yes/no answer plus a localization map. Fine-tune an open vision-language model such as LLaVA-1.6 or PaliGemma with parameter-efficient adapters (LoRA). Construct a 200-question evaluation set spanning four common finding categories. Report calibrated accuracy (using a reliability diagram), per-category sensitivity/specificity, and the median time to answer per question. Success is per-category sensitivity above 0.80 at specificity above 0.85 and an Expected Calibration Error below 5%. No clinical claims — the deliverable is a research-grade evaluation report for the advisory board.
The Brief
What you'll do, and what you'll demonstrate.
Build a VQA prototype on pediatric chest X-rays that hits per-category sensitivity ≥0.80 at specificity ≥0.85 with calibrated probabilities and per-question attention maps.
Earning criteria — what you'll demonstrate
- Adapt an open vision-language model to a clinical-style visual reasoning task with LoRA
- Measure calibration of a yes/no medical classifier with reliability diagrams and Expected Calibration Error
- Generate and qualitatively assess attention/saliency maps as explanation surfaces
- Communicate model limitations and dataset bias honestly to a non-ML clinical audience
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
ML Researcher
Fine-tuning an open vision-language model on a domain task and writing a careful, calibrated evaluation is the foundational deliverable expected of a junior ML researcher in healthtech or any domain-specific AI team.
This challenge sharpens
- vision-language-models
- lora-finetuning
- calibration
Research Scientist
Reporting per-category sensitivity/specificity with reliability diagrams and explicit limitations mirrors the rigor expected in a research-scientist's first publication-ready evaluation.
This challenge sharpens
- evaluation
- calibration
- visual-question-answering
Applied AI Scientist
Translating a research-grade VQA evaluation into a board-ready advisory deck with honest limitations is daily work for applied AI scientists in regulated industries.
This challenge sharpens
- evaluation
- visual-question-answering
- pytorch