Skip to contentSkip to content
Verified credentials. On-chain. Forever.Learn more
Cover image for Benchmark Reward-from-Feedback Methods on a Tabletop Pick-Place
Research

Benchmark Reward-from-Feedback Methods on a Tabletop Pick-Place

FreeVerified credential4 weeksExpert

Overview

What this challenge is about.

You will use a Franka Panda arm in PyBullet on a 4-object pick-and-place task. For each of the three feedback methods, train a reward model and a downstream policy until convergence or a 6-hour budget. Use a scripted oracle for the bulk of feedback (cheap), then run a 4-person pilot to estimate real human-operator time per feedback unit. Report sample efficiency (return vs. queries), policy quality (success rate), and operator burden (seconds per feedback). Deliver a 6-page benchmark note plus code and trained checkpoints.

CredentialBlockchain-anchored
ShareableLinkedIn-ready
LanguageEnglish
PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Rank three reward-from-feedback methods on sample efficiency, policy quality, and operator burden on a single, controlled task.

Earning criteria — what you'll demonstrate

  • Implement and compare reward-from-feedback methods in a controlled task
  • Design a benchmark that fairly compares methods despite different feedback shapes
  • Quantify operator burden alongside policy quality
  • Write an internal research note appropriate for a lab audience

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Research Scientist

Owning a controlled benchmark across feedback methods and writing the internal note is the entry-level work of a research scientist at an AI lab.

This challenge sharpens

  • reward-learning
  • preference-comparison
  • experiment-design

ML Researcher

Sample-efficiency reporting with multiple seeds and honest caveats is the methodological core of ML research.

This challenge sharpens

  • reinforcement-learning
  • benchmarking
  • experiment-design

AI Safety Researcher

Reward-from-feedback methods sit squarely in alignment-and-safety research; this benchmark gives the student a credible safety-research artefact.

This challenge sharpens

  • reward-learning
  • preference-comparison
  • benchmarking

One more thing

You can put a credential on your CV by Friday.

Benchmark Reward-from-Feedback Methods on a Tabletop Pick-Place | Ewance Challenge