Skip to contentSkip to content
Verified credentials. On-chain. Forever.Learn more
Cover image for Chunking Strategy Bake-Off for Financial Filings
Analysis

Chunking Strategy Bake-Off for Financial Filings

FreeVerified credential2 weeksIntermediate

Overview

What this challenge is about.

You receive 40 anonymized 10-K filings and 100 labeled questions split into 50 narrative (e.g., 'what is the company's main risk factor?') and 50 numerical (e.g., 'what was operating income in FY23?'). Implement 4 chunking strategies: recursive-character (baseline), semantic-similarity (using embeddings), layout-aware (preserving sections + tables), and hierarchical (summary + leaf). For each, run an identical retrieval + generation pipeline and score on Hits@5 plus a 30-question manual answer-correctness sample. Report per-strategy + per-question-type results and recommend one strategy. Success is the recommended strategy materially beating the baseline on numerical questions without losing on narrative.

CredentialBlockchain-anchored
ShareableLinkedIn-ready
LanguageEnglish
PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Compare 4 chunking strategies on a 10-K research assistant and recommend one that lifts numerical-question accuracy without regressing on narrative.

Earning criteria — what you'll demonstrate

  • Implement and compare 4 chunking strategies on real documents
  • Design a fair evaluation that separates narrative and numerical questions
  • Reason about chunking trade-offs (granularity, table preservation, embedding quality)
  • Document the recommendation for a non-research engineering team

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

AI Engineer

Running structured chunking bake-offs on real documents is the kind of evaluation work AI engineers do constantly when shipping RAG products.

This challenge sharpens

  • document-chunking
  • rag-evaluation
  • experiment-design

NLP Engineer

Layout-aware and semantic chunking choices are core NLP-engineer territory in document-AI teams.

This challenge sharpens

  • semantic-chunking
  • layout-aware-chunking
  • document-chunking

Applied AI Scientist

Comparing 4 strategies on the right question-type split and writing a methodology note is applied-AI-scientist judgement work.

This challenge sharpens

  • experiment-design
  • rag-evaluation
  • document-chunking

One more thing

You can put a credential on your CV by Friday.