Build Semantic Search for an Internal Engineering Wiki

FreeVerified credential2 weeksIntermediate

Overview

What this challenge is about.

You receive a Confluence XML export (~12k pages, ~80 MB of text) and a hand-labeled benchmark of 50 internal queries with ground-truth doc IDs. Chunk and embed the corpus with a sentence-transformers model, index it in pgvector on a single Postgres instance, expose a /search HTTP endpoint, and measure recall@5 and MRR@10 on the benchmark. Compare two encoders (a small all-MiniLM-L6-v2 baseline and a larger bge-base) on quality and per-query latency. Success is recall@5 above 0.80 on the benchmark with p95 query latency under 200 ms on a 4-vCPU box. Wrap the service in Docker with a one-page README the platform team can follow.

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Stand up a sandbox semantic-search service over the internal engineering wiki that hits recall@5 ≥ 0.80 on a labeled benchmark at sub-200 ms p95 latency.

Earning criteria — what you'll demonstrate

Pick an embedding model appropriate to corpus size and latency budget
Apply chunking strategies (fixed-size vs structural) and measure their impact on retrieval quality
Operate pgvector with HNSW indexes inside Postgres
Evaluate retrieval with recall@k and MRR on a hand-labeled benchmark

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Vector Databases and Embeddings

Master · Ai Ml

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Career paths this builds toward

Canonical roles

AI Engineer
AI Engineering

AI Engineer

Standing up a retrieval service end-to-end (embed, index, serve, measure) is the day-one job description of an AI engineer at any company shipping Retrieval-Augmented Generation features.

This challenge sharpens

embedding-models
vector-search
pgvector

Machine Learning Engineer

Treating retrieval as a measured system with offline benchmarks and latency budgets mirrors how MLEs ship ranking and recommendation services.

This challenge sharpens

evaluation
embedding-models
python

Data Engineer

The ingestion + chunking pipeline and operating pgvector inside Postgres are core skills data engineers use when standing up vector workloads alongside OLTP data.

This challenge sharpens

chunking-strategy
pgvector
python

One more thing

You can put a credential on your CV by Friday.

Start this challenge