Overview
What this challenge is about.
You receive a quadruped locomotion environment (Isaac Lab or pybullet-quadruped), the previous reward function (5 terms), and a budget of 6 training runs. Design 4 reward variants by tuning the torque-penalty and energy-penalty weights, plus one with a curriculum that anneals the torque penalty over training. Train PPO on each for an identical compute budget. Evaluate on 50 randomized terrains: success rate, average forward velocity, fall rate, and average torque. Pick one reward function and recommend in a 2-page memo. Success is at least a 20-point improvement in success rate over the original reward.
The Brief
What you'll do, and what you'll demonstrate.
Rework the locomotion reward function to handle higher torque noise without sacrificing forward velocity or stability.
Earning criteria — what you'll demonstrate
- Apply principled reward shaping to a deep-RL locomotion task
- Use curriculum-style reward annealing for stability
- Evaluate locomotion policies on multiple operational metrics, not just success
- Communicate reward-engineering choices to an applied robotics team
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Machine Learning Engineer
Reward-shaping for a real hardware constraint is the kind of MLE judgement work that ships robotics products.
This challenge sharpens
- reward-shaping
- ppo
- deep-rl
ML Researcher
Structured ablation across reward formulations is research-engineering work that opens doors at robot-learning labs.
This challenge sharpens
- reward-shaping
- curriculum-learning
- policy-evaluation
Research Scientist
Multi-metric locomotion evaluation with seed variance is the rigor research-scientist roles need.
This challenge sharpens
- policy-evaluation
- deep-rl
- locomotion