Design a Retrieval Pipeline for a Climate-Research Open Archive
Overview
What this challenge is about.
You receive a metadata sample (5,000 documents) plus 50 example researcher queries (mixed-language). Design a retrieval pipeline architecture that: (1) extracts and normalizes structured metadata at ingest, (2) indexes documents lexically (BM25) and densely (multilingual embeddings), (3) applies hybrid retrieval with optional metadata filters, (4) returns results with provenance and language indicators. Implement only a small proof-of-concept on the 5,000-document sample. Deliver: architecture diagram, written specification, proof-of-concept code, and a 4-week implementation plan for two engineers.
The Brief
What you'll do, and what you'll demonstrate.
Specify a multilingual, metadata-aware retrieval pipeline that a 2-engineer team can ship in 4 weeks.
Earning criteria — what you'll demonstrate
- Design a retrieval architecture that integrates lexical, dense, and metadata filters
- Write a specification a small engineering team can execute against
- Translate research-grade retrieval methods into a constrained delivery plan
- Reason about provenance and language indicators in user-facing search
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
AI Solutions Architect
Owning a retrieval-architecture spec for a real client team is the day-to-day work of an AI solutions architect at any consulting or platform org.
This challenge sharpens
- retrieval-architecture
- spec-writing
- implementation-planning
AI Engineer
Designing the hybrid + metadata stack and proving it on a sample is the AI-engineer skillset that platform teams hire for.
This challenge sharpens
- hybrid-search
- metadata-extraction
- multilingual-search
Data Engineer
Specifying the ingest-and-extract pipeline plus the implementation plan is core data-engineering territory.
This challenge sharpens
- metadata-extraction
- retrieval-architecture
- implementation-planning