Skip to contentSkip to content
Verified credentials. On-chain. Forever.Learn more
Cover image for Build Semantic Search for an Internal Engineering Wiki
Code

Build Semantic Search for an Internal Engineering Wiki

FreeVerified credential2 weeksIntermediate

Overview

What this challenge is about.

You receive a Confluence XML export (~12k pages, ~80 MB of text) and a hand-labeled benchmark of 50 internal queries with ground-truth doc IDs. Chunk and embed the corpus with a sentence-transformers model, index it in pgvector on a single Postgres instance, expose a /search HTTP endpoint, and measure recall@5 and MRR@10 on the benchmark. Compare two encoders (a small all-MiniLM-L6-v2 baseline and a larger bge-base) on quality and per-query latency. Success is recall@5 above 0.80 on the benchmark with p95 query latency under 200 ms on a 4-vCPU box. Wrap the service in Docker with a one-page README the platform team can follow.

CredentialBlockchain-anchored
ShareableLinkedIn-ready
LanguageEnglish
PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Stand up a sandbox semantic-search service over the internal engineering wiki that hits recall@5 ≥ 0.80 on a labeled benchmark at sub-200 ms p95 latency.

Earning criteria — what you'll demonstrate

  • Pick an embedding model appropriate to corpus size and latency budget
  • Apply chunking strategies (fixed-size vs structural) and measure their impact on retrieval quality
  • Operate pgvector with HNSW indexes inside Postgres
  • Evaluate retrieval with recall@k and MRR on a hand-labeled benchmark

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

AI Engineer

Standing up a retrieval service end-to-end (embed, index, serve, measure) is the day-one job description of an AI engineer at any company shipping Retrieval-Augmented Generation features.

This challenge sharpens

  • embedding-models
  • vector-search
  • pgvector

Machine Learning Engineer

Treating retrieval as a measured system with offline benchmarks and latency budgets mirrors how MLEs ship ranking and recommendation services.

This challenge sharpens

  • evaluation
  • embedding-models
  • python

Data Engineer

The ingestion + chunking pipeline and operating pgvector inside Postgres are core skills data engineers use when standing up vector workloads alongside OLTP data.

This challenge sharpens

  • chunking-strategy
  • pgvector
  • python

One more thing

You can put a credential on your CV by Friday.

Build Semantic Search for an Internal Engineering Wiki | Ewance Challenge