Overview
What this challenge is about.
Receive the MPI weather model (Fortran 2018 + C, ~22,000 lines), 4 weeks of strong-scaling logs (1 to 256 ranks), and access to AWS ParallelCluster with EFA-enabled c7gn instances. Profile collective communication with mpiP + Score-P. Identify the top 3 scaling bottlenecks (likely: all-to-all collectives on too-small messages, halo-exchange synchronization, load imbalance from terrain-grid heterogeneity). Apply fixes: overlap halo exchange with computation via non-blocking MPI_Isend/Irecv, batch small messages, switch all-to-all to neighbor-collectives where the topology allows, enable RDMA via EFA. Benchmark strong scaling at 64, 128, 256, 512, 1,024 ranks. Goal: >=60 percent parallel efficiency at 1,024 ranks. Deliver the patched model, the per-bottleneck experiment notes, the strong-scaling curve, the cluster-config artifacts, and an 8-page write-up for the research-tech lead.
The Brief
What you'll do, and what you'll demonstrate.
Diagnose and fix an MPI weather-model scaling collapse so that strong-scaling efficiency exceeds 60 percent at 1,024 ranks on AWS ParallelCluster.
Earning criteria — what you'll demonstrate
- Profile MPI collectives with mpiP + Score-P at scale
- Apply communication-computation overlap with non-blocking MPI
- Configure EFA-enabled clusters for RDMA-aware MPI transport
- Benchmark strong scaling and parallel efficiency rigorously
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Career mappings coming soon.