Skip to contentSkip to content
Verified credentials. On-chain. Forever.Learn more
Cover image for Reward Shaping for a Quadruped Locomotion Policy
Research

Reward Shaping for a Quadruped Locomotion Policy

FreeVerified credential3 weeksAdvanced

Overview

What this challenge is about.

You receive a quadruped locomotion environment (Isaac Lab or pybullet-quadruped), the previous reward function (5 terms), and a budget of 6 training runs. Design 4 reward variants by tuning the torque-penalty and energy-penalty weights, plus one with a curriculum that anneals the torque penalty over training. Train PPO on each for an identical compute budget. Evaluate on 50 randomized terrains: success rate, average forward velocity, fall rate, and average torque. Pick one reward function and recommend in a 2-page memo. Success is at least a 20-point improvement in success rate over the original reward.

CredentialBlockchain-anchored
ShareableLinkedIn-ready
LanguageEnglish
PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Rework the locomotion reward function to handle higher torque noise without sacrificing forward velocity or stability.

Earning criteria — what you'll demonstrate

  • Apply principled reward shaping to a deep-RL locomotion task
  • Use curriculum-style reward annealing for stability
  • Evaluate locomotion policies on multiple operational metrics, not just success
  • Communicate reward-engineering choices to an applied robotics team

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Machine Learning Engineer

Reward-shaping for a real hardware constraint is the kind of MLE judgement work that ships robotics products.

This challenge sharpens

  • reward-shaping
  • ppo
  • deep-rl

ML Researcher

Structured ablation across reward formulations is research-engineering work that opens doors at robot-learning labs.

This challenge sharpens

  • reward-shaping
  • curriculum-learning
  • policy-evaluation

Research Scientist

Multi-metric locomotion evaluation with seed variance is the rigor research-scientist roles need.

This challenge sharpens

  • policy-evaluation
  • deep-rl
  • locomotion

One more thing

You can put a credential on your CV by Friday.

Reward Shaping for a Quadruped Locomotion Policy | Ewance Challenge