Chunking Strategy Bake-Off for Financial Filings

FreeVerified credential2 weeksIntermediate

Overview

What this challenge is about.

You receive 40 anonymized 10-K filings and 100 labeled questions split into 50 narrative (e.g., 'what is the company's main risk factor?') and 50 numerical (e.g., 'what was operating income in FY23?'). Implement 4 chunking strategies: recursive-character (baseline), semantic-similarity (using embeddings), layout-aware (preserving sections + tables), and hierarchical (summary + leaf). For each, run an identical retrieval + generation pipeline and score on Hits@5 plus a 30-question manual answer-correctness sample. Report per-strategy + per-question-type results and recommend one strategy. Success is the recommended strategy materially beating the baseline on numerical questions without losing on narrative.

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Compare 4 chunking strategies on a 10-K research assistant and recommend one that lifts numerical-question accuracy without regressing on narrative.

Earning criteria — what you'll demonstrate

Implement and compare 4 chunking strategies on real documents
Design a fair evaluation that separates narrative and numerical questions
Reason about chunking trade-offs (granularity, table preservation, embedding quality)
Document the recommendation for a non-research engineering team

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Retrieval-Augmented Generation

Master · Ai Ml

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Career paths this builds toward

Canonical roles

AI Engineer
AI Engineering

AI Engineer

Running structured chunking bake-offs on real documents is the kind of evaluation work AI engineers do constantly when shipping RAG products.

This challenge sharpens

document-chunking
rag-evaluation
experiment-design

NLP Engineer

Layout-aware and semantic chunking choices are core NLP-engineer territory in document-AI teams.

This challenge sharpens

semantic-chunking
layout-aware-chunking
document-chunking

Applied AI Scientist

Comparing 4 strategies on the right question-type split and writing a methodology note is applied-AI-scientist judgement work.

This challenge sharpens

experiment-design
rag-evaluation
document-chunking

One more thing

You can put a credential on your CV by Friday.

Start this challenge