Benchmark Reward-from-Feedback Methods on a Tabletop Pick-Place

FreeVerified credential4 weeksExpert

Overview

What this challenge is about.

You will use a Franka Panda arm in PyBullet on a 4-object pick-and-place task. For each of the three feedback methods, train a reward model and a downstream policy until convergence or a 6-hour budget. Use a scripted oracle for the bulk of feedback (cheap), then run a 4-person pilot to estimate real human-operator time per feedback unit. Report sample efficiency (return vs. queries), policy quality (success rate), and operator burden (seconds per feedback). Deliver a 6-page benchmark note plus code and trained checkpoints.

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Rank three reward-from-feedback methods on sample efficiency, policy quality, and operator burden on a single, controlled task.

Earning criteria — what you'll demonstrate

Implement and compare reward-from-feedback methods in a controlled task
Design a benchmark that fairly compares methods despite different feedback shapes
Quantify operator burden alongside policy quality
Write an internal research note appropriate for a lab audience

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Human-Robot Interaction

Master · Ai Ml

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Career paths this builds toward

Canonical roles

Research Scientist
AI Research

Research Scientist

Owning a controlled benchmark across feedback methods and writing the internal note is the entry-level work of a research scientist at an AI lab.

This challenge sharpens

reward-learning
preference-comparison
experiment-design

ML Researcher

Sample-efficiency reporting with multiple seeds and honest caveats is the methodological core of ML research.

This challenge sharpens

reinforcement-learning
benchmarking
experiment-design

AI Safety Researcher

Reward-from-feedback methods sit squarely in alignment-and-safety research; this benchmark gives the student a credible safety-research artefact.

This challenge sharpens

reward-learning
preference-comparison
benchmarking

One more thing

You can put a credential on your CV by Friday.

Start this challenge