Reward Shaping for a Quadruped Locomotion Policy

FreeVerified credential3 weeksAdvanced

Overview

What this challenge is about.

You receive a quadruped locomotion environment (Isaac Lab or pybullet-quadruped), the previous reward function (5 terms), and a budget of 6 training runs. Design 4 reward variants by tuning the torque-penalty and energy-penalty weights, plus one with a curriculum that anneals the torque penalty over training. Train PPO on each for an identical compute budget. Evaluate on 50 randomized terrains: success rate, average forward velocity, fall rate, and average torque. Pick one reward function and recommend in a 2-page memo. Success is at least a 20-point improvement in success rate over the original reward.

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Rework the locomotion reward function to handle higher torque noise without sacrificing forward velocity or stability.

Earning criteria — what you'll demonstrate

Apply principled reward shaping to a deep-RL locomotion task
Use curriculum-style reward annealing for stability
Evaluate locomotion policies on multiple operational metrics, not just success
Communicate reward-engineering choices to an applied robotics team

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Robot Learning

Master · Ai Ml

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Career paths this builds toward

Canonical roles

Machine Learning Engineer
AI Engineering

Machine Learning Engineer

Reward-shaping for a real hardware constraint is the kind of MLE judgement work that ships robotics products.

This challenge sharpens

reward-shaping
ppo
deep-rl

ML Researcher

Structured ablation across reward formulations is research-engineering work that opens doors at robot-learning labs.

This challenge sharpens

reward-shaping
curriculum-learning
policy-evaluation

Research Scientist

Multi-metric locomotion evaluation with seed variance is the rigor research-scientist roles need.

This challenge sharpens

policy-evaluation
deep-rl
locomotion

One more thing

You can put a credential on your CV by Friday.

Start this challenge