Run an A/B Test on Two System Prompts for a Sales Email Assistant
Overview
What this challenge is about.
You will (1) design the A/B test (random assignment by rep_id, 50/50 split, 2-week duration), (2) instrument three primary metrics: reply rate (event-based), average tokens per email, mean rep edit-distance between generated and sent text, (3) build a small analysis notebook that computes per-variant metrics with bootstrap confidence intervals and a Bayesian posterior on the lift, (4) write a 4-page ship/kill memo. The 2-week field run is simulated using a provided synthetic event stream that reflects realistic noise. Deliver: experiment design doc, analysis notebook, 4-page memo.
The Brief
What you'll do, and what you'll demonstrate.
Decide whether the new system prompt beats the existing one on reply rate without inflating tokens or rep-edit work.
Earning criteria — what you'll demonstrate
- Design a server-side A/B test for an LLM-powered feature
- Choose and instrument metrics that triangulate quality, cost, and effort
- Analyse A/B results with bootstrap CIs and Bayesian posteriors
- Write a ship/kill memo a non-technical growth team can act on
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Prompt Engineer
Running a real A/B test on a prompt change with proper statistics is the day-to-day of prompt engineers at any growth-driven AI product.
This challenge sharpens
- prompt-evaluation
- ab-testing
- prompt-engineering
Data Scientist
Bootstrap CIs and Bayesian posteriors on a feature A/B test are the core data-scientist toolkit on any growth team.
This challenge sharpens
- ab-testing
- bayesian-analysis
- experiment-design
AI Product Manager
Owning a pre-registered prompt experiment and the ship/kill memo is the AI PM's daily job in growth-led AI orgs.
This challenge sharpens
- experiment-design
- metric-design
- ab-testing