Overview
What this challenge is about.
You receive a synthetic dataset of 60 founder-style queries paired with 'workspaces' (each up to 500 documents across 3 source types), plus gold-standard answers and citation lists. Build an agentic RAG that (a) plans a retrieval sequence, (b) summarizes intermediate results, (c) tracks a hard context-token budget across iterations, (d) produces a final cited answer. Evaluate against single-shot RAG on (i) answer correctness via LLM-judge, (ii) citation precision/recall, (iii) total token cost, (iv) latency. Success is correctness lift of at least 10 points at no more than 1.5x baseline token cost.
The Brief
What you'll do, and what you'll demonstrate.
Design and evaluate an agentic RAG with a context-window budget controller that meaningfully beats single-shot RAG without runaway cost.
Earning criteria — what you'll demonstrate
- Design an iterative retrieval + summarization agent under a hard token budget
- Compare agentic vs. single-shot RAG fairly on cost and quality
- Apply citation-tracking through multiple agent iterations
- Communicate cost-quality trade-offs in a product memo
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
AI Engineer
Designing agentic RAG with explicit cost discipline is the kind of practical work AI engineers do at every product-led AI startup.
This challenge sharpens
- agentic-rag
- iterative-retrieval
- tool-use
Machine Learning Engineer
Building a fair quality + cost comparison between agentic and single-shot variants is core MLE work in LLM product teams.
This challenge sharpens
- rag-evaluation
- context-window-management
- python
Prompt Engineer
Designing the per-iteration prompts and summarization templates that keep citations alive is core prompt-engineer territory.
This challenge sharpens
- iterative-retrieval
- tool-use
- context-window-management