Build a Speaker-Diarization Pipeline for a Legal-Tech Startup
Overview
What this challenge is about.
You receive 20 hours of de-identified hearing audio with ground-truth speaker labels (4 speaker classes per hearing). Build a speaker-diarization pipeline (pyannote-audio or similar) and tune it for the 2-6 speaker range typical in hearings. Measure Diarization Error Rate (DER, the standard speaker-attribution metric) overall and on the witness-vs-defense slice (the hardest cases). Hit DER under 12 percent on the full set. Deliver the pipeline, eval report, and a 2-page memo on integration with the existing ASR stack.
The Brief
What you'll do, and what you'll demonstrate.
Build a speaker-diarization pipeline that brings DER under 12 percent on legal-hearing audio.
Earning criteria — what you'll demonstrate
- Build a modern speaker-diarization pipeline
- Evaluate diarization with DER and sliced analysis
- Tune diarization for a known-cardinality speaker setup
- Document integration with an existing ASR stack
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
NLP Engineer
Owning the diarization layer on top of ASR is the day-to-day work of NLP/speech engineers at any voice-transcription startup.
This challenge sharpens
- speaker-diarization
- speech-recognition
- pyannote
Machine Learning Engineer
Integrating two ML components (ASR + diarization) into a shippable pipeline is core MLE craft.
This challenge sharpens
- pyannote
- evaluation
- pytorch
Applied AI Scientist
Sliced evaluation on the hard cases (witness-vs-defense) and a written integration memo are bread-and-butter applied-AI work.
This challenge sharpens
- speaker-diarization
- audio-processing
- evaluation