Build a Domain-Specific Named-Entity Recognizer for Legal Contracts
Overview
What this challenge is about.
Start from a strong English NER base (spaCy transformer or LegalBERT). Fine-tune on a provided 1,200-contract labeled dataset for the 9 entity types. Handle long contracts (often 30+ pages) with proper context-window strategy (sliding windows, hierarchical inference). Evaluate on 200 held-out contracts: per-entity F1 and document-level coverage (% of entities found per doc). Deliver the trained model, an inference script that handles long inputs, and a 4-page integration spec the product team builds the contract-viewer around.
The Brief
What you'll do, and what you'll demonstrate.
Train a domain-tuned NER for 9 contract entity types with per-entity F1 reported on a 200-contract held-out set.
Earning criteria — what you'll demonstrate
- Adapt pretrained NER to a domain-specific entity schema
- Apply sliding-window strategies for long-document NER
- Evaluate sequence-labeling models per-entity and per-document
- Document an NER model for downstream product integration
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
NLP Engineer
Domain-adapted NER on long documents is the bread-and-butter of NLP engineers at legal-tech, fintech compliance, and healthcare AI companies.
This challenge sharpens
- named-entity-recognition
- sequence-labeling
- domain-adaptation
Machine Learning Engineer
Shipping a trained NER with annotation guidelines + an integration spec is the end-to-end MLE work that vertical AI consultancies need on every engagement.
This challenge sharpens
- transformers
- pytorch
- long-document-nlp
Applied AI Scientist
Translating NER research methods into a domain-specific product capability is exactly the applied-AI work done in legal-tech and verticalized AI.
This challenge sharpens
- named-entity-recognition
- domain-adaptation
- transformers