Overview
What this challenge is about.
You receive a Python discrete-event simulator with state encoded as a 12-dimensional categorical vector (around 8,000 reachable states) and 6 possible slotting actions, plus 2 years of order-line data to drive simulator demand. Implement tabular Q-learning with epsilon-greedy exploration, train for 200K episodes with logged convergence, and validate on a held-out 3 months of demand. Compare to the existing rule-based slotting on (a) average picker travel per order line and (b) percentage of high-velocity SKUs in golden zones. Success is a 12 percent or better reduction in picker travel with no worsening of the golden-zone metric.
The Brief
What you'll do, and what you'll demonstrate.
Use tabular Q-learning to learn a warehouse slotting policy that materially beats the current rule-based heuristic on simulated demand.
Earning criteria — what you'll demonstrate
- Implement tabular Q-learning with epsilon-greedy exploration from scratch
- Tune exploration schedules and learning rates for convergence
- Evaluate RL policies against a non-RL baseline on operationally meaningful metrics
- Communicate RL results to a non-technical operations audience
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Machine Learning Engineer
Implementing tabular RL on a real operational simulator and beating a rule-based baseline is the kind of MLE win that ships in logistics products.
This challenge sharpens
- tabular-rl
- q-learning
- python
Data Scientist
Validating an RL policy against a heuristic baseline on operationally meaningful metrics is the daily craft of data scientists in logistics and operations roles.
This challenge sharpens
- policy-evaluation
- simulation
- python
Applied AI Scientist
Choosing tabular RL over deep RL because the state space allows it is exactly the kind of judgement applied AI scientists exercise.
This challenge sharpens
- tabular-rl
- epsilon-greedy
- policy-evaluation