Human Evaluation
If you like applying Human Evaluation, every challenge here gives you a chance to practice it on a real industry brief.
- ResearchExpertNew
Stress-Test Scalable Oversight on a Tool-Using Agent
Design a sandwich-oversight study: pick a task domain where non-expert oversight is plausible but not trivial (e.g., reviewing data-analysis steps, checking small bug fixes, eva…
- Scalable Oversight
- Alignment Research
- Experiment Design
AI Safety and Alignment - ResearchIntermediateNew
Run a Human-Preference Study Comparing Two Coding Assistants
Design a blinded paired-comparison study: 12 developer participants, each gets the same 8 realistic coding tasks (refactor, write a function, debug, test), each task is solved b…
- Experiment Design
- Statistical Evaluation
- Human Evaluation
AI Measurement and Evaluation
How it works
From brief to credential, in six steps.
Step 01
Browse challenges aligned to your studies.
Step 02
Accept the one that fits your goals.
Step 03
Work through it with AI Copilot guidance.
Step 04
Submit for structured evaluation.
Step 05
Earn a verified credential.
Step 06
Add it to LinkedIn with one click.
Industry teams behind a decade of practitioner briefs
Hiring from this pool?
Sponsor a challenge and meet candidates through actual work.
Industry teams can shape briefs around the skills they hire for, then evaluate students on rubric-scored deliverables — not resumes.



















































































