Tabular Q-Learning for Warehouse Slotting

FreeVerified credential2 weeksIntermediate

Overview

What this challenge is about.

You receive a Python discrete-event simulator with state encoded as a 12-dimensional categorical vector (around 8,000 reachable states) and 6 possible slotting actions, plus 2 years of order-line data to drive simulator demand. Implement tabular Q-learning with epsilon-greedy exploration, train for 200K episodes with logged convergence, and validate on a held-out 3 months of demand. Compare to the existing rule-based slotting on (a) average picker travel per order line and (b) percentage of high-velocity SKUs in golden zones. Success is a 12 percent or better reduction in picker travel with no worsening of the golden-zone metric.

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Use tabular Q-learning to learn a warehouse slotting policy that materially beats the current rule-based heuristic on simulated demand.

Earning criteria — what you'll demonstrate

Implement tabular Q-learning with epsilon-greedy exploration from scratch
Tune exploration schedules and learning rates for convergence
Evaluate RL policies against a non-RL baseline on operationally meaningful metrics
Communicate RL results to a non-technical operations audience

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Reinforcement Learning

Master · Ai Ml

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Machine Learning Engineer

Implementing tabular RL on a real operational simulator and beating a rule-based baseline is the kind of MLE win that ships in logistics products.

This challenge sharpens

tabular-rl
q-learning
python

Data Scientist

Validating an RL policy against a heuristic baseline on operationally meaningful metrics is the daily craft of data scientists in logistics and operations roles.

This challenge sharpens

policy-evaluation
simulation
python

Applied AI Scientist

Choosing tabular RL over deep RL because the state space allows it is exactly the kind of judgement applied AI scientists exercise.

This challenge sharpens

tabular-rl
epsilon-greedy
policy-evaluation

One more thing

You can put a credential on your CV by Friday.

Start this challenge