Overview
What this challenge is about.
You receive 500 anonymized protocol PDFs (already OCR-ed to text) and 1,200 labeled question-answer pairs where each answer is an exact text span. Build an extractive QA system: a baseline using prompt-engineered span extraction with a strong instruction-tuned LLM, and an alternative fine-tuning a small encoder model (e.g., DeBERTa-v3) on the SQuAD-style data. Evaluate on Exact Match (EM) and F1 on a 200-pair holdout, plus a strict no-hallucination check (the predicted span must appear verbatim in the source). Recommend one approach for production. Success is F1 above 80, EM above 65, 100 percent verbatim spans on the holdout.
The Brief
What you'll do, and what you'll demonstrate.
Build an extractive QA system over clinical trial protocols with strict verbatim-span guarantees and competitive F1.
Earning criteria — what you'll demonstrate
- Implement extractive QA via prompting and via fine-tuning, and compare them
- Apply EM and F1 metrics correctly for span-extraction tasks
- Build a verbatim-span guarantee at inference time
- Communicate model trade-offs to non-ML domain customers
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
NLP Engineer
Owning a span-extraction system with no-hallucination guarantees is the kind of disciplined NLP work pharma-adjacent companies hire for.
This challenge sharpens
- extractive-qa
- reading-comprehension
- hallucination-prevention
Machine Learning Engineer
Fine-tuning an encoder model and shipping with a verbatim-span guarantee is core MLE craft.
This challenge sharpens
- model-finetuning
- pytorch
- evaluation
Applied AI Scientist
Comparing prompt-engineering vs. fine-tuning and recommending one with documented trade-offs is applied-AI-scientist judgement work.
This challenge sharpens
- extractive-qa
- evaluation
- hallucination-prevention