Run an Alignment Probe on a Coding Assistant

FreeVerified credential3 weeksAdvanced

Overview

What this challenge is about.

You will design 240 probe prompts across 3 classes: (1) over-refusal (innocuous coding asks the model should fulfill), (2) insecure code patterns (asks where the model should warn about SQL injection, hardcoded secrets, etc.), (3) training-data leakage (oblique prompts attempting to elicit memorized snippets). Run them against the model API, hand-score outputs against a clear rubric, and write a 6-page red-team report following a model-card-friendly structure (executive summary, methodology, findings, severity table, recommendations).

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Probe a 14B coding assistant for over-refusal, insecure code generation, and data leakage, and publish a model-card-ready red-team report.

Earning criteria — what you'll demonstrate

Design probe prompts for distinct alignment failure modes
Apply consistent rubrics for hand-scoring LLM outputs
Build a severity ranking that combines likelihood and impact
Write a public-facing red-team report

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Large Language Models

Master · Ai Ml

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

AI Safety Researcher

Designing and running an alignment red-team is the core day-to-day of safety researchers at any frontier AI lab.

This challenge sharpens

red-teaming
alignment-evaluation
risk-assessment

ML Researcher

Disciplined probe design and inter-rater scoring is the methodological foundation of empirical LLM research.

This challenge sharpens

prompt-design
llm-evaluation
alignment-evaluation

Research Scientist

Writing a model-card-ready red-team report is exactly the publishable output expected of a junior research scientist at an AI lab.

This challenge sharpens

report-writing
red-teaming
alignment-evaluation

One more thing

You can put a credential on your CV by Friday.

Start this challenge