Overview
What this challenge is about.
You receive (1) a frozen ONNX export of the production model, (2) a sample request trace of 24 hours at 1% sampling, and (3) a single A100-class GPU sandbox. Profile with NVIDIA Nsight Systems and PyTorch Profiler, characterize batch-size and shape distributions, identify the top three bottlenecks (CUDA kernel, host overhead, batch padding waste, etc.), and prototype the top fix (dynamic batching, kernel fusion, quantization to INT8/FP8). Report headline latency p50/p99 and GPU-hour cost per million predictions before/after. Deliver a 4-page memo for the staff engineer.
The Brief
What you'll do, and what you'll demonstrate.
Profile a production-scale recommender, find the top three inference-cost wins, and prove out the headline one with hard before/after numbers.
Earning criteria — what you'll demonstrate
- Profile a real GPU inference path with industry-standard tools
- Quantify the host vs. device time split and identify waste
- Apply dynamic batching, quantization, or kernel fusion in practice
- Connect a millisecond-level win to a monthly cost-savings number
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Machine Learning Engineer
Inference profiling and quantization on a real production-shaped workload is the staple work of MLEs on inference-platform teams at any hyperscaler.
This challenge sharpens
- inference-optimization
- model-quantization
- gpu-profiling
MLOps Engineer
Owning the cost/latency story for a serving stack and turning profile data into a deployable fix is core MLOps territory on platform teams.
This challenge sharpens
- inference-optimization
- benchmarking
- tensorrt
AI Solutions Architect
Translating millisecond wins into USD/month savings and writing the staff-engineer memo is the skill bridge into AI solutions architecture roles at cloud providers.
This challenge sharpens
- benchmarking
- inference-optimization
- gpu-profiling