Overview
What this challenge is about.
You receive 1,800 pages of policy documents (Markdown) and 150 labeled question-answer pairs with the gold source policy IDs. Build a hybrid retrieval pipeline: BM25 + dense embeddings, fused via Reciprocal Rank Fusion (RRF), then a cross-encoder reranker on the top-20. Generate a final answer with explicit policy-ID citations. Evaluate on (a) Hits@5 vs. vector-only baseline, (b) RAGAS faithfulness score on a 50-pair sample, and (c) citation-ID precision (the cited policy ID matches the gold). Success is Hits@5 improvement of at least 12 points, faithfulness above 0.90, citation precision above 95 percent.
The Brief
What you'll do, and what you'll demonstrate.
Build a hybrid (BM25 + dense + reranker) RAG that materially beats vector-only retrieval on an HR-policy assistant.
Earning criteria — what you'll demonstrate
- Combine sparse (BM25) and dense retrieval with rank fusion
- Apply a cross-encoder reranker to lift top-k quality
- Evaluate RAG with multiple metrics (retrieval, faithfulness, citation)
- Communicate retrieval improvements to a non-ML client
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
AI Engineer
Shipping hybrid retrieval with a reranker and a real eval suite is the day-one job of AI engineers at every enterprise-AI startup hiring in 2024-25.
This challenge sharpens
- hybrid-search
- dense-retrieval
- rag-evaluation
NLP Engineer
Combining BM25 with neural retrieval is the canonical NLP-engineer skill set in modern search and RAG teams.
This challenge sharpens
- bm25
- dense-retrieval
- reranking
Machine Learning Engineer
Owning a multi-metric eval and translating it into a client release is the kind of MLE follow-through that earns trust.
This challenge sharpens
- rag-evaluation
- python
- reranking