Skip to contentSkip to content
Verified credentials. On-chain. Forever.Learn more
Cover image for Prototype a Multimodal Visual-Question-Answering Demo
Code

Prototype a Multimodal Visual-Question-Answering Demo

FreeVerified credential2 weeksIntermediate

Overview

What this challenge is about.

You will use a small open-source vision-language model (e.g., LLaVA-1.5-7B or PaliGemma) and prompt-engineer it for the warehouse-VQA task. Build a Gradio web demo. Construct a 200-question evaluation set covering counting, presence, condition, and location questions, with reference answers. Report accuracy per question type and surface failure modes. Deliver the demo, an evaluation notebook, and a 10-slide deck for the client presentation.

CredentialBlockchain-anchored
ShareableLinkedIn-ready
LanguageEnglish
PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Build a working warehouse-VQA demo on a small vision-language model and quantify accuracy per question type.

Earning criteria — what you'll demonstrate

  • Apply a small open-source vision-language model to a domain VQA task
  • Prompt-engineer for grounded multimodal reasoning
  • Construct a balanced VQA evaluation set across question types
  • Present a working multimodal demo to a mixed client audience

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Career paths this builds toward

Canonical roles

AI Engineer

Shipping a working multimodal demo for a real client meeting is the AI-engineer's signature deliverable at consultancies and AI-forward product teams.

This challenge sharpens

  • vision-language-models
  • demo-development
  • prompt-engineering

Prompt Engineer

Designing and iterating prompts for grounded multimodal reasoning is exactly the prompt-engineer skill set hiring managers screen for.

This challenge sharpens

  • prompt-engineering
  • vision-language-models
  • evaluation

Applied AI Scientist

Constructing a balanced VQA evaluation set and reporting per-question-type accuracy is the applied-AI-scientist's craft when shipping new capabilities.

This challenge sharpens

  • multimodal-perception
  • evaluation
  • vision-language-models

One more thing

You can put a credential on your CV by Friday.