Overview
What this challenge is about.
You receive 40 anonymized 10-K filings and 100 labeled questions split into 50 narrative (e.g., 'what is the company's main risk factor?') and 50 numerical (e.g., 'what was operating income in FY23?'). Implement 4 chunking strategies: recursive-character (baseline), semantic-similarity (using embeddings), layout-aware (preserving sections + tables), and hierarchical (summary + leaf). For each, run an identical retrieval + generation pipeline and score on Hits@5 plus a 30-question manual answer-correctness sample. Report per-strategy + per-question-type results and recommend one strategy. Success is the recommended strategy materially beating the baseline on numerical questions without losing on narrative.
The Brief
What you'll do, and what you'll demonstrate.
Compare 4 chunking strategies on a 10-K research assistant and recommend one that lifts numerical-question accuracy without regressing on narrative.
Earning criteria — what you'll demonstrate
- Implement and compare 4 chunking strategies on real documents
- Design a fair evaluation that separates narrative and numerical questions
- Reason about chunking trade-offs (granularity, table preservation, embedding quality)
- Document the recommendation for a non-research engineering team
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
AI Engineer
Running structured chunking bake-offs on real documents is the kind of evaluation work AI engineers do constantly when shipping RAG products.
This challenge sharpens
- document-chunking
- rag-evaluation
- experiment-design
NLP Engineer
Layout-aware and semantic chunking choices are core NLP-engineer territory in document-AI teams.
This challenge sharpens
- semantic-chunking
- layout-aware-chunking
- document-chunking
Applied AI Scientist
Comparing 4 strategies on the right question-type split and writing a methodology note is applied-AI-scientist judgement work.
This challenge sharpens
- experiment-design
- rag-evaluation
- document-chunking