Overview
What this challenge is about.
You receive OpenAPI specs for 4 mock internal APIs and 30 reference question-answer pairs spanning easy lookups and multi-tool chains. Build the agent using an LLM tool-use framework (e.g., LangChain, LlamaIndex, or direct Anthropic Claude / OpenAI tool-calling). Add: (a) structured tool definitions, (b) max-step + max-cost budgets, (c) a deny-list on destructive actions (no DELETE calls), (d) an evaluation harness scoring task success on the reference set. Write the design memo for the customer's CIO covering architecture, failure modes, and the rollout plan.
The Brief
What you'll do, and what you'll demonstrate.
Ship a tool-using LLM agent that resolves 80%+ of reference questions safely, with budgets and guardrails the customer's CIO can sign off on.
Earning criteria — what you'll demonstrate
- Design tool-using LLM agents with explicit budgets and guardrails
- Build an evaluation harness for agent task success
- Reason about failure modes specific to tool-use agents
- Communicate agent architecture to a non-engineering executive
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
AI Engineer
Building a tool-using agent for an enterprise customer end-to-end is the canonical AI-engineer project at agent-platform startups.
This challenge sharpens
- llm-agents
- tool-use
- guardrails