Overview
What this challenge is about.
You receive a curated extraction dataset (2,000 train, 500 val, 500 test contracts with span-level labels across 12 clause types) and a fine-tunable 7B base model (e.g., Llama-3-8B Instruct or Mistral-7B). Train a LoRA adapter using a single rented A100. Compare against a 5-shot prompted baseline on macro-F1 per clause, latency, and dollar cost per 1,000 contracts. Report results, ship the adapter, and write a 2-page memo with a clear go/no-go recommendation backed by per-clause numbers.
The Brief
What you'll do, and what you'll demonstrate.
Fine-tune a 7B LLM with LoRA for legal-clause extraction and quantify whether it beats prompt-engineering on accuracy, latency, and cost per document.
Earning criteria — what you'll demonstrate
- Implement LoRA fine-tuning on a 7B base model
- Design a fair comparison between prompted and fine-tuned baselines
- Evaluate extraction quality with span-level macro-F1
- Reason about the cost/quality trade-off for LLM products
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Machine Learning Engineer
Shipping a LoRA adapter end-to-end with cost analysis and a go/no-go memo is exactly the day-one work of an MLE at any LLM-powered product startup.
This challenge sharpens
- lora
- fine-tuning
- huggingface
NLP Engineer
Span-level extraction evaluation with macro-F1 across many clause types is core NLP-engineer work in any text-extraction product.
This challenge sharpens
- llm-evaluation
- fine-tuning
- huggingface
AI Engineer
Translating training results into a shippable adapter plus a cost memo for leadership is the AI-engineer craft of putting LLMs into product.
This challenge sharpens
- lora
- pytorch
- parameter-efficient-tuning