Skip to contentSkip to content
Verified credentials. On-chain. Forever.Learn more
Cover image for Red-Team Evaluation of a Refusal Policy
Research

Red-Team Evaluation of a Refusal Policy

FreeVerified credential2 weeksAdvanced

Overview

What this challenge is about.

You receive the lab's written refusal policy (version 2.3) and a starter set of 60 red-team prompts (10 per category). Extend the set to 240 prompts (40 per category) using documented elicitation patterns (direct, indirect, role-play, multi-turn build-up). Run the prompts against the candidate model and the previous release. Score each response on a 4-level rubric (full refusal, soft refusal with explanation, partial compliance, full compliance) blind-graded by 2 raters. Report per-category rates, inter-rater agreement, and 10 worst-case failure traces. Recommend ship/no-ship plus monitoring metrics.

CredentialBlockchain-anchored
ShareableLinkedIn-ready
LanguageEnglish
PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Run a structured red-team eval across 6 harm categories with per-category quantitative and qualitative findings, and recommend ship/no-ship.

Earning criteria — what you'll demonstrate

  • Design red-team prompts across multiple elicitation patterns
  • Apply a multi-level scoring rubric with inter-rater reliability
  • Detect regressions vs. a previous release
  • Communicate safety findings to a release-decision review

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

AI Safety Researcher

Running structured red-team evals across multiple harm categories is the day-one job of AI safety researchers at every frontier lab.

This challenge sharpens

  • red-teaming
  • alignment-evaluation
  • refusal-policy

ML Researcher

Inter-rater methodology and regression analysis are core post-training research skills.

This challenge sharpens

  • alignment-evaluation
  • inter-rater-reliability
  • rubric-design

Applied AI Scientist

Translating eval results into a defensible ship/no-ship recommendation is applied-AI-scientist judgement work in safety-sensitive deployments.

This challenge sharpens

  • red-teaming
  • responsible-ai
  • alignment-evaluation

One more thing

You can put a credential on your CV by Friday.