Evaluate Speech-to-Text Quality for a Contact-Center Analytics Vendor
Overview
What this challenge is about.
You receive 200 anonymized call-recording snippets (2-4 minutes each, ~67 per language) with reference transcripts plus a domain glossary of about 600 product terms. Run all three engines, report Word Error Rate (WER), domain-term Recall (DTR), and a Named-Entity F1 over a curated 80-entity test set per language. Also measure latency per hour of audio. Recommend one engine per language and discuss whether a mixed-engine strategy would beat any single engine on cost-adjusted accuracy.
The Brief
What you'll do, and what you'll demonstrate.
Pick the best speech-to-text engine (or mix) for a multilingual contact-center analytics product on cost-adjusted accuracy.
Earning criteria — what you'll demonstrate
- Apply standard speech-to-text evaluation metrics across multiple languages
- Quantify domain-term recall and named-entity accuracy alongside WER
- Combine accuracy and cost into a procurement-grade recommendation
- Communicate licensing-relevant findings to a procurement-aware stakeholder
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Applied AI Scientist
Multilingual model bake-offs with cost-adjusted recommendations are the applied-AI-scientist's bread-and-butter at B2B AI vendors.
This challenge sharpens
- speech-recognition
- benchmarking
- multilingual-evaluation
NLP Engineer
Hands-on evaluation across multiple speech engines is core NLP-engineer territory; the named-entity sub-task bridges directly into NER work.
This challenge sharpens
- speech-recognition
- model-evaluation
- multilingual-evaluation
ML Researcher
Designing fair multi-engine benchmarks with confidence intervals is the kind of rigor ML researchers in industry are graded on.
This challenge sharpens
- benchmarking
- sequence-models
- model-evaluation