Overview
What this challenge is about.
You will implement the agent in either LangChain or LlamaIndex (your choice; defend it in the readme). Wire 4 tools: (1) read-only SQL on a sample warehouse, (2) a mocked BI metrics REST API, (3) a document-search retriever over 5,000 internal PDFs, (4) a calendar-lookup tool. Implement tool-calling with structured outputs, multi-step reasoning, and a max-step guard. Build a 60-question evaluation set covering single-tool, multi-tool, and 'no tool needed' questions; grade with a rubric. Deliver: agent code, eval harness, eval results, and a 3-page architecture doc.
The Brief
What you'll do, and what you'll demonstrate.
Build an internal reporting agent that chains 4 tools correctly and survives a 60-question evaluation harness.
Earning criteria — what you'll demonstrate
- Implement tool-calling and multi-step reasoning in an agent framework
- Design tool schemas that an LLM can call reliably
- Build an evaluation harness that catches tool-call hallucinations
- Communicate agent architecture and trade-offs in writing
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
AI Engineer
Building a tool-calling agent with a real evaluation harness is the day-to-day of AI engineers at any AI-product company.
This challenge sharpens
- agent-orchestration
- tool-calling
- structured-outputs
AI Solutions Architect
Defending framework choice and tool design in writing is the AI solutions architect's daily output in enterprise engagements.
This challenge sharpens
- agent-orchestration
- langchain-or-llamaindex
- agent-evaluation
AI Product Manager
Owning the eval harness and per-question-type breakdown lets a PM gate ship decisions on real evidence.
This challenge sharpens
- agent-evaluation
- tool-calling
- structured-outputs