Run a Human-Preference Study Comparing Two Coding Assistants

FreeVerified credential3 weeksIntermediate

Overview

What this challenge is about.

Design a blinded paired-comparison study: 12 developer participants, each gets the same 8 realistic coding tasks (refactor, write a function, debug, test), each task is solved by both assistants, participants choose preferred output. Randomize assistant order and pre-register a primary outcome (proportion preferring Assistant A) plus a sample-size justification. Analyze with a paired binomial test and a small Bayesian alternative. Report effect size with confidence intervals. Produce a 5-page report plus a 30-minute founder briefing.

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Run a pre-registered human-preference study comparing two coding assistants and produce a vendor-decision recommendation.

Earning criteria — what you'll demonstrate

Design a pre-registered human-preference study
Justify sample size before collecting data
Analyze paired-comparison data with frequentist and Bayesian methods
Present a vendor decision under statistical uncertainty

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

AI Measurement and Evaluation

Master · Ai Ml

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Career paths this builds toward

Canonical roles

Applied AI Scientist
AI Research

Applied AI Scientist

Designing a pre-registered evaluation for a real vendor decision is the applied AI scientist's contribution to product orgs.

This challenge sharpens

experiment-design
statistical-evaluation
llm-evaluation

Data Scientist

Paired-comparison analysis with honest uncertainty is bread-and-butter data-scientist craft.

This challenge sharpens

statistical-evaluation
experiment-design
pre-registration

AI Product Manager

Turning an evaluation into a defensible vendor decision is exactly the AI PM's contribution to the procurement conversation.

This challenge sharpens

stakeholder-communication
human-evaluation
llm-evaluation

One more thing

You can put a credential on your CV by Friday.

Start this challenge