Overview
What this challenge is about.
Receive the current worker (Rust, around 8,000 lines, uses rayon for its parallelism), the host (2-socket AMD EPYC 9354, 64 cores total, 384GB DDR5), and a benchmark query workload. Redesign: pin one worker pool per socket via core_affinity, allocate per-socket arenas using a NUMA-aware allocator (mimalloc with NUMA pages or jemalloc with the right MALLOC_ARENA_MAX), shard incoming work by hash(query.key) % 2 sockets, allow cross-socket stealing only when a worker is fully idle. Validate with perf c2c that cross-socket cache-coherence traffic drops. Benchmark scaling from 1 to 64 cores; the goal is at least 75 percent efficiency at 32 cores. Deliver the rewritten worker, the NUMA-tuning configuration, the perf c2c before/after, the scaling-curve plot, and a 5-page platform-team write-up.
The Brief
What you'll do, and what you'll demonstrate.
Redesign a multicore query-execution worker to scale efficiently to 32 cores on a 2-socket box using NUMA-aware sharding and prove the collapse goes away.
Earning criteria — what you'll demonstrate
- Apply NUMA-aware sharding and per-socket worker pools
- Use perf c2c to measure cross-socket coherence traffic
- Configure NUMA-aware allocators correctly
- Benchmark scaling efficiency, not just speedup
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Career mappings coming soon.