Evaluate a Generative AI Image Tool with a Within-Subjects Study
Overview
What this challenge is about.
You will write a study protocol, recruit 20 participants (a Discord callout is fine), counterbalance the two conditions, and run 45-minute sessions over Zoom. Collect three measures per condition: a 7-point satisfaction Likert, a perceived-control short scale (3 items), and time-to-first-shareable-image (a wall-clock measurement). Run paired statistics on the within-subjects data (paired t-test or Wilcoxon depending on distribution). Deliver: protocol, anonymized data sheet, analysis notebook, and a 4-page results report with a clear ship/kill recommendation.
The Brief
What you'll do, and what you'll demonstrate.
Decide whether guided-prompt onboarding genuinely improves first-week engagement using a clean within-subjects user study.
Earning criteria — what you'll demonstrate
- Design and run a within-subjects user study with counterbalancing
- Choose and report appropriate paired statistics for ordinal and continuous data
- Evaluate generative-AI tools with both subjective and behavioral measures
- Translate study results into a single ship/kill recommendation
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
AI Product Designer
Running a credible within-subjects study on a consumer AI surface plus shipping the recommendation is exactly the work AI product designers do for growth teams.
This challenge sharpens
- user-study
- within-subjects-design
- human-ai-interaction
AI Product Manager
Owning the ship/kill recommendation backed by paired statistics mirrors the AI PM's day-to-day at a feature-driven consumer-AI org.
This challenge sharpens
- experiment-design
- evaluation
- statistical-analysis
AI Safety Researcher
Pre-registered evaluations and honest reporting of effect sizes are the methodological core of AI safety research on real users.
This challenge sharpens
- experiment-design
- evaluation
- statistical-analysis