Overview
What this challenge is about.
You receive weekly snapshots over 12 weeks of around 50,000 document embeddings each (1024-dim). Design and build a visualization tool that: (a) projects each snapshot to 2D with a consistent reference frame across weeks, (b) surfaces clusters that grew, shrank, or shifted, and (c) lists the top-20 documents that moved most between consecutive weeks. Use UMAP with anchor points for cross-week stability. Deliverable is the notebook tool, three weekly screenshots interpreted in plain English, and a one-page rollout brief for risk.
The Brief
What you'll do, and what you'll demonstrate.
Build a weekly embedding-drift visualization that risk-grade users trust enough to approve a rollout.
Earning criteria — what you'll demonstrate
- Project high-dim embeddings consistently across snapshots
- Detect and visualize cluster-level change over time
- Communicate model behavior to a risk-function audience
- Build notebook-grade tools the rest of the team actually uses
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
AI Engineer
Owning observability tools for retrieval and embedding systems is a core AI engineer specialization at enterprise-AI startups.
This challenge sharpens
- embeddings
- drift-detection
- python
AI Safety Researcher
Designing tools that let risk reviewers audit model behavior is exactly the kind of work AI safety researchers do for enterprise rollouts.
This challenge sharpens
- drift-detection
- interactive-visualization
- embeddings
Data Scientist
Tracking cluster-level change in a 50k-doc corpus over time is the kind of exploratory analytical work data scientists own.
This challenge sharpens
- dimensionality-reduction
- umap
- interactive-visualization