Red-Team a Customer-Service Chatbot for Jailbreak Resistance
Overview
What this challenge is about.
Use a published taxonomy of jailbreak categories (prompt injection, persona override, encoded payloads, multi-turn escalation, refusal bypass, tool-misuse). For each category, design at least 8 attack prompts (so 48+ total). Run them against the chatbot in a controlled environment, score success/failure with a documented rubric, and quantify success rate per category with bootstrap confidence intervals. Produce a 6-page red-team report including 5 worked-example failures, recommended mitigations (input filters, system-prompt hardening, tool gating), and a 1-page exec summary. Do not test on production traffic; use a staging endpoint.
The Brief
What you'll do, and what you'll demonstrate.
Run a structured red-team campaign on a customer-service LLM chatbot and ship a remediation-ready report.
Earning criteria — what you'll demonstrate
- Apply a published jailbreak taxonomy to a real product
- Design and score adversarial prompts systematically
- Quantify safety with honest statistics, not vibe scores
- Translate findings into mitigations engineers can ship
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
AI Safety Researcher
Structured red-teaming with statistics and remediation framing is the AI safety researcher's most marketable craft right now.
This challenge sharpens
- red-teaming
- jailbreak-analysis
- safety-evaluation
Prompt Engineer
Designing attack prompts across a taxonomy is exactly the prompt engineer's adversarial mode of operation.
This challenge sharpens
- prompt-engineering
- jailbreak-analysis
- llm-evaluation
AI Engineer
Translating red-team findings into shippable mitigations (filters, gating, system-prompt hardening) is the AI engineer's daily contribution.
This challenge sharpens
- llm-evaluation
- prompt-engineering
- safety-evaluation