Spec Trust-and-Safety Eval Harness for an LLM-Powered Customer-Support Bot
Overview
What this challenge is about.
You will spec a 6-page evaluation harness covering: (1) jailbreak test set (about 200 prompts across 6 attack families), (2) PII-leakage probes (about 100 synthetic-customer prompts), (3) harmful-output classifier integration (use Detoxify or similar), (4) regression-detection gating (block deploy if any axis regresses beyond a threshold). Produce a reference Python implementation that runs the three axes on a small example bot. Deliver the spec, the reference harness, a 1-page exec summary, and an example nightly-eval report.
The Brief
What you'll do, and what you'll demonstrate.
Spec and reference-implement a nightly trust-and-safety harness for an LLM customer-support bot covering jailbreaks, PII, and toxicity.
Earning criteria — what you'll demonstrate
- Design a multi-axis safety evaluation harness for LLM products
- Curate jailbreak and PII test sets at useful scale
- Integrate a toxicity classifier into automated gating
- Document a harness so engineering can pick it up next sprint
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
AI Safety Researcher
Designing a nightly safety eval harness for an LLM product is the AI safety researcher's textbook job at any enterprise-AI vendor.
This challenge sharpens
- llm-evaluation
- red-teaming
- pii-detection
MLOps Engineer
Gating deploys on regression-detection thresholds is the MLOps engineer's craft applied to safety axes.
This challenge sharpens
- regression-detection
- harness-design
- python
Prompt Engineer
Curating jailbreak test sets and reasoning about LLM failure axes is core prompt-engineer territory in safety-conscious product teams.
This challenge sharpens
- llm-evaluation
- red-teaming
- harness-design