Overview
What this challenge is about.
You receive a PyBullet pick-and-place environment (Franka Panda arm, 12 object types, randomized starting poses) and a SAC baseline that hits 85% success after about 1.5 million environment steps. Implement a Dreamer-V3 or similar latent-dynamics world model with a learned policy that acts via imagined rollouts. Measure success rate vs. environment steps for both methods up to 1 million steps and seed-average over 5 seeds. Write a 2-page memo on the engineering ROI of MBRL for this task class.
The Brief
What you'll do, and what you'll demonstrate.
Quantify the sample-efficiency advantage of a model-based RL agent over a strong model-free baseline on a realistic manipulation task.
Earning criteria — what you'll demonstrate
- Implement a latent-dynamics world model for control
- Compare model-based vs. model-free RL fairly on sample efficiency
- Run controlled ablations on world-model hyperparameters
- Reason about engineering ROI of complex RL methods in production
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Research Scientist
Implementing a recent research method (Dreamer) and running rigorous ablations against a strong baseline is exactly the work expected of a junior research scientist on an RL team.
This challenge sharpens
- model-based-rl
- world-models
- experiment-design
ML Researcher
Sample-efficiency comparisons with proper compute accounting are the kind of practical research questions ML researchers answer for product teams.
This challenge sharpens
- reinforcement-learning
- experiment-design
- pytorch
Applied AI Scientist
Translating an RL research result into an engineering-ROI memo is core applied-AI-scientist work in any robotics company.
This challenge sharpens
- model-based-rl
- manipulation
- world-models