Overview
What this challenge is about.
You receive 28,000 labeled emails (skewed toward English and Mandarin). Try at least two approaches: (1) a fine-tuned multilingual transformer (XLM-RoBERTa or mDeBERTa) and (2) a few-shot LLM prompt with a smaller open model. Engineer features for short emails (subject + body, language detection). Evaluate per-language and overall, and calibrate probabilities so the 0.7 deferral threshold is meaningful. Deliver the model, an eval report including a per-language confusion matrix, and a 4-page memo for the support lead.
The Brief
What you'll do, and what you'll demonstrate.
Beat 85% macro F1 on multilingual email classification with calibrated confidence good enough for an automated routing system.
Earning criteria — what you'll demonstrate
- Apply multilingual transformers to a real classification problem
- Compare fine-tuning vs in-context learning fairly
- Calibrate model probabilities for routing decisions
- Diagnose per-language bias in multilingual models
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
NLP Engineer
Building production-shaped multilingual classifiers with calibrated confidence is the day-job of NLP engineers at fintechs, marketplaces, and global SaaS companies.
This challenge sharpens
- text-classification
- multilingual-nlp
- transformers
Machine Learning Engineer
Calibrating models for routing decisions and writing the memo that adopts them is core MLE work on operations and support automation teams.
This challenge sharpens
- calibration
- evaluation
- pytorch
Applied AI Scientist
Comparing fine-tune vs few-shot on a real fintech dataset and translating the result into a deployment decision is exactly applied-AI work.
This challenge sharpens
- transformers
- evaluation
- calibration