Tune OpenMP Performance on a Memory-Bound Genomics Pipeline
Overview
What this challenge is about.
Profile the existing pipeline at 1, 4, 8, 16, 32, 48, 64 threads using Intel VTune + Linux perf. Identify the bottlenecks (likely candidates: NUMA-unaware memory allocation, false sharing on a hot counter, memory-bandwidth saturation on a particular kernel). Apply targeted optimizations: NUMA-aware allocation via libnuma, padding to eliminate false sharing, loop restructuring for cache locality. Re-benchmark. Deliver tuned source code, the strong-scaling study, and an 8-page writeup including before/after VTune evidence per optimization.
The Brief
What you'll do, and what you'll demonstrate.
Tune an OpenMP variant-calling pipeline from poor 16-thread scaling to at least 48-thread useful scaling on a dual-socket NUMA node.
Earning criteria — what you'll demonstrate
- Profile OpenMP code with Intel VTune to identify memory bottlenecks
- Apply NUMA-aware allocation on a multi-socket node
- Identify and eliminate false sharing through padding + restructuring
- Validate optimizations with before/after measurement, not intuition
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Career mappings coming soon.