Skip to contentSkip to content
Verified credentials. On-chain. Forever.Learn more
Cover image for Run an Alignment Probe on a Coding Assistant
Research

Run an Alignment Probe on a Coding Assistant

FreeVerified credential3 weeksAdvanced

Overview

What this challenge is about.

You will design 240 probe prompts across 3 classes: (1) over-refusal (innocuous coding asks the model should fulfill), (2) insecure code patterns (asks where the model should warn about SQL injection, hardcoded secrets, etc.), (3) training-data leakage (oblique prompts attempting to elicit memorized snippets). Run them against the model API, hand-score outputs against a clear rubric, and write a 6-page red-team report following a model-card-friendly structure (executive summary, methodology, findings, severity table, recommendations).

CredentialBlockchain-anchored
ShareableLinkedIn-ready
LanguageEnglish
PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Probe a 14B coding assistant for over-refusal, insecure code generation, and data leakage, and publish a model-card-ready red-team report.

Earning criteria — what you'll demonstrate

  • Design probe prompts for distinct alignment failure modes
  • Apply consistent rubrics for hand-scoring LLM outputs
  • Build a severity ranking that combines likelihood and impact
  • Write a public-facing red-team report

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

AI Safety Researcher

Designing and running an alignment red-team is the core day-to-day of safety researchers at any frontier AI lab.

This challenge sharpens

  • red-teaming
  • alignment-evaluation
  • risk-assessment

ML Researcher

Disciplined probe design and inter-rater scoring is the methodological foundation of empirical LLM research.

This challenge sharpens

  • prompt-design
  • llm-evaluation
  • alignment-evaluation

Research Scientist

Writing a model-card-ready red-team report is exactly the publishable output expected of a junior research scientist at an AI lab.

This challenge sharpens

  • report-writing
  • red-teaming
  • alignment-evaluation

One more thing

You can put a credential on your CV by Friday.