Extractive QA on Clinical Trial Protocols

FreeVerified credential2 weeksAdvanced

Overview

What this challenge is about.

You receive 500 anonymized protocol PDFs (already OCR-ed to text) and 1,200 labeled question-answer pairs where each answer is an exact text span. Build an extractive QA system: a baseline using prompt-engineered span extraction with a strong instruction-tuned LLM, and an alternative fine-tuning a small encoder model (e.g., DeBERTa-v3) on the SQuAD-style data. Evaluate on Exact Match (EM) and F1 on a 200-pair holdout, plus a strict no-hallucination check (the predicted span must appear verbatim in the source). Recommend one approach for production. Success is F1 above 80, EM above 65, 100 percent verbatim spans on the holdout.

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Build an extractive QA system over clinical trial protocols with strict verbatim-span guarantees and competitive F1.

Earning criteria — what you'll demonstrate

Implement extractive QA via prompting and via fine-tuning, and compare them
Apply EM and F1 metrics correctly for span-extraction tasks
Build a verbatim-span guarantee at inference time
Communicate model trade-offs to non-ML domain customers

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Question Answering and Conversational Systems

Master · Ai Ml

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Career paths this builds toward

Canonical roles

NLP Engineer
AI Engineering

NLP Engineer

Owning a span-extraction system with no-hallucination guarantees is the kind of disciplined NLP work pharma-adjacent companies hire for.

This challenge sharpens

extractive-qa
reading-comprehension
hallucination-prevention

Machine Learning Engineer

Fine-tuning an encoder model and shipping with a verbatim-span guarantee is core MLE craft.

This challenge sharpens

model-finetuning
pytorch
evaluation

Applied AI Scientist

Comparing prompt-engineering vs. fine-tuning and recommending one with documented trade-offs is applied-AI-scientist judgement work.

This challenge sharpens

extractive-qa
evaluation
hallucination-prevention

One more thing

You can put a credential on your CV by Friday.

Start this challenge