Benchmark Approximate Nearest-Neighbor Indexes for a Code-Search Startup
Overview
What this challenge is about.
You receive a 5 M-vector sample (768-dim, float32) and a 1,000-query labeled benchmark with ground-truth top-50 neighbors per query. Index the same sample in Chroma (HNSW), Qdrant (HNSW), and Weaviate (HNSW), tuned to land within 5 percentage points of recall@10 of each other. Then measure: per-query p50/p95 latency at concurrency 1 and 16, RAM footprint, on-disk size, and index build time. Re-run at 10 M vectors if your VM has the headroom. Write a 3-page memo with one clear recommendation, the trade-off table behind it, and a list of what changes at 200 M.
The Brief
What you'll do, and what you'll demonstrate.
Pick the production approximate-nearest-neighbor store for a code-search workload by benchmarking Chroma, Qdrant, and Weaviate on recall, latency, RAM, and build time at the same operating point.
Earning criteria — what you'll demonstrate
- Understand HNSW parameters (M, ef_construction, ef_search) and how they trade quality for latency
- Design a fair vector-store benchmark at matched recall
- Project capacity from a 5 M-vector measurement to a 200 M-vector production target
- Defend an infrastructure recommendation to engineering leadership in writing
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
MLOps Engineer
Picking and sizing the right infra for a vector workload is core MLOps work at any AI-product company scaling past the prototype phase.
This challenge sharpens
- ann-indexes
- capacity-planning
- benchmarking
Data Engineer
Operating vector stores alongside OLTP and warehouse systems is becoming standard data-engineering scope; this challenge gives directly relevant operating experience.
This challenge sharpens
- vector-databases
- hnsw
- capacity-planning
AI Solutions Architect
Translating a benchmark into a written trade-off recommendation that an exec can sign off on is the day-to-day deliverable of an AI solutions architect.
This challenge sharpens
- benchmarking
- vector-databases
- capacity-planning
AI Engineer
Knowing how HNSW parameters move recall and latency is table stakes for any AI engineer shipping retrieval features against a managed vector store.
This challenge sharpens
- hnsw
- ann-indexes
- python