Overview
What this challenge is about.
You receive a Gymnasium-compatible warehouse simulator (50x50 grid, 8 dynamic obstacle pedestrians, 20 randomized pick locations) and a baseline A* planner script. Train a DQN agent with experience replay and a target network until it consistently beats A* on a 100-episode evaluation suite. Report mean episode reward, mean travel time per pick, collision rate per 1,000 steps, and wall-clock training time on a single rented L4 GPU. Wrap everything in a reproducible training script and write a 2-page memo for the head of engineering with a go/no-go on a real-robot pilot.
The Brief
What you'll do, and what you'll demonstrate.
Train a DQN warehouse-routing policy that beats the production A* planner on travel time and collision rate, then recommend whether to pilot it on real robots.
Earning criteria — what you'll demonstrate
- Implement deep Q-learning with experience replay and a target network
- Design a fair comparison between a learned policy and a classical planner
- Diagnose RL training instability via diagnostic plots
- Translate experimental results into a business recommendation
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
ML Researcher
Running a deep RL training campaign with multi-seed evaluation, diagnostic plots, and a written recommendation is the daily reality of applied ML research at any robotics company.
This challenge sharpens
- deep-q-learning
- reinforcement-learning
- experiment-design
Machine Learning Engineer
Packaging a training pipeline so anyone can rerun it, plus benchmarking against a production baseline, mirrors the MLE craft of shipping research into reproducible systems.
This challenge sharpens
- pytorch
- benchmarking
- experiment-design
Applied AI Scientist
Translating an RL training run into a costed go/no-go memo for engineering leadership is exactly what applied AI scientists ship every quarter.
This challenge sharpens
- reinforcement-learning
- benchmarking
- simulation