Overview
What this challenge is about.
Profile the CPU MC engine to identify the kernel candidates: path generation (Brownian motion + correlated factors), payoff evaluation, aggregation. Port to CUDA: use cuRAND for path generation, careful kernel design for coalesced memory access, shared memory for intermediate reductions. Validate numerical correctness vs the CPU baseline at 1M, 10M, 100M paths. Measure speedup, occupancy, and roofline placement. Deliver CUDA source, correctness report (max relative error per metric), and 8-page writeup. Target at least 30x end-to-end speedup vs the 64-core CPU baseline.
The Brief
What you'll do, and what you'll demonstrate.
Port a CPU Monte Carlo risk engine to CUDA on 4x A100 GPUs, targeting at least 30x speedup with documented numerical-correctness vs the CPU baseline.
Earning criteria — what you'll demonstrate
- Port a real numerical workload from CPU to CUDA correctly
- Use cuRAND + thoughtful kernel design for coalesced memory access
- Validate numerical correctness across host + device implementations
- Conduct occupancy + roofline analysis to identify optimization headroom
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Career mappings coming soon.