Fuse Camera + Audio Cues for an Autonomous-Vehicle Edge Case
Overview
What this challenge is about.
You receive a curated dataset of 4,000 short clips (5s each), each with synchronized 8-camera 360-degree video, 4-channel audio, and labels (siren-active emergency vehicle present yes/no, direction of approach in 8 bins). Build a baseline visual-only detector (a small CNN over per-camera frames) and a multimodal model that fuses CNN features with a small audio-spectrogram encoder. Evaluate on detection recall at low false-positive rate (max 1 FP per 100 city-driving minutes) and direction-bin accuracy. Deliver a 4-page comparison memo plus a Jupyter demo notebook.
The Brief
What you'll do, and what you'll demonstrate.
Build a multimodal perception model that fuses camera + audio cues to detect approaching emergency vehicles at urban intersections better than camera-only.
Earning criteria — what you'll demonstrate
- Apply CNN architectures for visual perception on multi-camera input
- Encode short audio segments via spectrogram features for downstream fusion
- Compare early vs. late fusion strategies on a real perception task
- Communicate safety-critical evaluation results to perception leadership
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Computer Vision Engineer
Multimodal perception under safety constraints is exactly the work CV engineers ship on AV perception teams; this challenge produces a credible portfolio piece.
This challenge sharpens
- multimodal-perception
- convolutional-neural-networks
- feature-fusion
Applied AI Scientist
Comparing fusion strategies and quantifying safety-relevant trade-offs is the applied-AI-scientist's signature contribution to AV roadmaps.
This challenge sharpens
- multimodal-perception
- feature-fusion
- model-evaluation
ML Researcher
Designing a controlled fusion-strategy comparison on a real long-tail case mirrors the entry-level ML-researcher's project portfolio.
This challenge sharpens
- audio-processing
- convolutional-neural-networks
- feature-fusion