Overview
What this challenge is about.
You receive 30 research questions with paralegal-written gold answers and citation lists, plus stubbed implementations of the 4 tools (you do not need to build retrieval — just call the stubs). Design a ReAct prompt with a clear Thought/Action/Observation loop and JSON-schema tool calls. Score on (a) correct tool-call sequence (paralegal-labelled), (b) citation precision and recall against the gold list, and (c) memo readability rated by 1 reviewer on a 1-5 scale. Success is tool-call accuracy above 80 percent, citation F1 above 0.75, and average readability above 4.0.
The Brief
What you'll do, and what you'll demonstrate.
Design a ReAct-style legal-research agent that calls 4 tools in correct sequence and produces a cited memo a paralegal would accept.
Earning criteria — what you'll demonstrate
- Design a ReAct prompt with clean tool-call schemas
- Build an agent loop that handles tool errors and dead-ends
- Evaluate agent quality on multiple axes (correctness, citation, readability)
- Document failure modes for product iteration
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
AI Engineer
Building a tool-using ReAct agent with a real eval suite is the day-one skillset for AI engineer roles at any vertical AI startup.
This challenge sharpens
- react-prompting
- tool-use
- agent-design
Prompt Engineer
Designing the ReAct system prompt and tool schemas under accuracy + readability targets is core prompt-engineer territory.
This challenge sharpens
- react-prompting
- json-schema
- evaluation
NLP Engineer
Citation handling and document-grounded generation are bread-and-butter NLP-engineer work at legal-tech companies.
This challenge sharpens
- citation-handling
- tool-use
- evaluation