Train a Reinforcement-Learning Policy for Drone Obstacle Avoidance
Overview
What this challenge is about.
You receive a custom Gymnasium drone-flight environment (provided), a baseline hand-engineered controller, and a target evaluation suite covering 4 obstacle densities. Train a PPO policy with vectorized rollouts (Stable-Baselines3 or RLlib) for a fixed compute budget of 24 GPU-hours. Evaluate over 200 rollouts per density, measuring success rate, mean path length, and collision-rate. Compare against the hand-engineered baseline and write the 4-page sim-to-real gap memo that names what's needed to move to a real-drone trial.
The Brief
What you'll do, and what you'll demonstrate.
Train a PPO obstacle-avoidance policy that beats the hand-engineered baseline across obstacle densities and supports a credible sim-to-real plan.
Earning criteria — what you'll demonstrate
- Apply PPO to a continuous-control robotics task end-to-end
- Design structured evaluation suites for RL policies
- Reason about the sim-to-real gap explicitly
- Communicate RL trade-offs to a non-RL audience
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
ML Researcher
End-to-end RL training with structured evaluation and an honest sim-to-real memo is the canonical first project for a junior ML researcher on a robotics team.
This challenge sharpens
- reinforcement-learning
- ppo
- policy-evaluation
Research Scientist
Domain-randomization design and per-condition evaluation discipline are the research-scientist skills that get cited in robotics labs.
This challenge sharpens
- reinforcement-learning
- sim-to-real
- policy-evaluation
Machine Learning Engineer
Reproducible RL training infrastructure with Docker + W&B is the MLE-flavored half of any RL project.
This challenge sharpens
- pytorch
- robotics-simulation
- ppo