Build a Vector-Search Backend for an Enterprise AI Knowledge Assistant

FreeVerified credential3 weeksAdvanced

Overview

What this challenge is about.

You receive a corpus of around 20,000 PDFs (mixed scanned and digital) totalling around 30 GB and a labeled retrieval set of 200 queries with human-judged ground-truth passages. Build the parsing-plus-chunking pipeline (text extraction, OCR fallback for scans, semantic chunking), an embedding pipeline using an open embedding model, and a hybrid (vector + BM25) retrieval API. Success is recall-at-10 above 0.85 on the labeled set, ingest throughput documented in pages-per-minute, and per-query latency under 300 milliseconds at p95.

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Build a RAG ingest-and-retrieval backend that hits recall-at-10 above 0.85 and p95 latency under 300 ms on an enterprise PDF corpus.

Earning criteria — what you'll demonstrate

Design a chunking strategy informed by retrieval evaluation
Operate an embedding pipeline at corpus scale
Combine vector and lexical retrieval into a hybrid system
Measure retrieval quality with standard metrics (recall@k, MRR)

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Data Engineering and Big Data Systems

Master · Ai Ml

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Career paths this builds toward

Canonical roles

AI Engineer
AI Engineering

AI Engineer

Building production-grade RAG retrieval backends is the single most common AI-engineer job description right now; this challenge ships the load-bearing piece.

This challenge sharpens

rag
vector-search
embeddings

Data Engineer

Corpus-scale ingest with parsing fallbacks and resumability is core data-engineering work that supports any RAG or search team.

This challenge sharpens

document-parsing
python
retrieval-evaluation

Machine Learning Engineer

Owning the retrieval-evaluation harness with recall@k and MRR mirrors how MLEs run model evals at scale.

This challenge sharpens

retrieval-evaluation
embeddings
vector-search

One more thing

You can put a credential on your CV by Friday.

Start this challenge