Overview
What this challenge is about.
You receive 5,000 logged trajectories (state, action, reward, next-state) across 12 tasks, with 9 tasks for training and 3 held out. Train an offline RL algorithm (CQL or IQL recommended) on the 9 training tasks. Evaluate the trained policy on the 3 held-out tasks in their simulator versions: zero-shot success rate, and few-shot success rate after 100 online interactions per task. Compare to a behavior-cloning baseline trained on the same data. Success is zero-shot lift over BC on at least 2 of 3 tasks, and a few-shot lift on all 3.
The Brief
What you'll do, and what you'll demonstrate.
Train an offline RL policy on logged trajectories that lifts zero-shot and few-shot performance on held-out tasks vs. a BC baseline.
Earning criteria — what you'll demonstrate
- Apply a modern offline RL algorithm (CQL or IQL) on real logged data
- Design a held-out task split for skill-reuse evaluation
- Compare offline RL to imitation baselines fairly
- Communicate offline-RL value to a consultancy's solutions team
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Applied AI Scientist
Translating logged operational data into a usable offline-RL skill pre-train is the daily work of applied AI scientists in industrial robotics.
This challenge sharpens
- offline-rl
- skill-reuse
- policy-evaluation
ML Researcher
Designing held-out task splits and comparing offline RL to imitation baselines is research-engineering work that opens doors at robot-learning teams.
This challenge sharpens
- offline-rl
- conservative-q-learning
- imitation-learning
Machine Learning Engineer
Wiring d3rlpy + simulator + eval harness into a reusable consultancy tool is core MLE work in industrial AI.
This challenge sharpens
- pytorch
- offline-rl
- policy-evaluation