Overview
What this challenge is about.
Use SNLI/MNLI/ANLI as starting data and curate 200 domain-specific HR examples (synthetic or anonymized) for fine-tuning. Fine-tune a small encoder (DeBERTa-v3-base or similar), calibrate outputs, and build a UI that flags JD sentences as 'likely violating', 'unsure', or 'likely fine' against a small rule set. Evaluate on a held-out 50-example test set and report per-label precision/recall. Deliver the fine-tuned model, a Streamlit UI, and a 3-page legal-team handover doc.
The Brief
What you'll do, and what you'll demonstrate.
Ship a calibrated NLI-based triage that cuts legal-review hours by 40 percent without missing high-risk JD sentences.
Earning criteria — what you'll demonstrate
- Fine-tune an NLI model on a domain-specific extension
- Calibrate classifier outputs and pick operational thresholds
- Build an honest UI that supports human-in-the-loop review
- Hand a model off to a non-engineering team responsibly
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
NLP Engineer
Domain-adapted NLI with calibration and a human-in-the-loop UI is exactly the work an NLP engineer does at any HR-tech or compliance-AI company.
This challenge sharpens
- natural-language-inference
- transformer-models
- fine-tuning
AI Safety Researcher
Calibrated classifiers that explicitly support human review are the safety-aware engineering pattern AI safety researchers practice.
This challenge sharpens
- calibration
- evaluation
- natural-language-inference
Applied AI Scientist
Translating a model into a real workflow change that legal teams will actually adopt is core applied AI scientist work.
This challenge sharpens
- fine-tuning
- calibration
- evaluation