Overview
What this challenge is about.
Profile the existing kernel on a 12-dataset benchmark (provided). Pick one of two strategies: (a) add an LLVM IR-level vectorization pass plugged into the existing pipeline, or (b) write a small C++ generator that emits AVX-512 intrinsics for the loop shape. Either way: prove dependency-analysis correctness (no loop-carried dependencies on the target loop), vectorize, and ship. Provide a benchmark report comparing scalar vs. vectorized on 12 datasets, validate bit-exact numerical equivalence on each, and write a 6-page design note covering the choice of approach.
The Brief
What you'll do, and what you'll demonstrate.
Vectorize a 38-percent-hot inner numerical loop for AVX-512, achieve at least 3x speedup, and prove bit-exact numerical equivalence with the scalar baseline.
Earning criteria — what you'll demonstrate
- Perform loop dependency analysis on a real numerical kernel
- Vectorize for AVX-512 with awareness of masking, gather/scatter, and tail handling
- Validate vectorization preserves numerical results bit-exactly
- Benchmark on representative datasets and report confidence intervals
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Career mappings coming soon.