Catastrophic-Forgetting Audit on a Domain Fine-Tune

FreeVerified credential2 weeksAdvanced

Overview

What this challenge is about.

You receive the fine-tuned 7B chemistry model and its base, plus a benchmark basket (MMLU subset, GSM8K, IFEval, a small instruction-following set). Run all 4 benchmarks on both models with seed-averaging. Identify the largest regression, design and run a small mitigation experiment (data-replay mix during fine-tuning OR LoRA-merge with the base) on a small subset, and report whether the mitigation closes the gap. Write a 3-page safety memo with concrete recommendations for the next fine-tune cycle.

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Audit a domain fine-tuned LLM for catastrophic forgetting, propose mitigations, and write the safety memo that informs the next fine-tune cycle.

Earning criteria — what you'll demonstrate

Design a catastrophic-forgetting audit for a domain fine-tune
Run multi-benchmark LLM evaluation with statistical rigor
Reason about mitigations (replay, merging, LoRA isolation)
Communicate safety findings to platform leadership

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Fine-Tuning Large Language Models

Master · Ai Ml

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Career paths this builds toward

Canonical roles

AI Safety Researcher
AI Research

AI Safety Researcher

Designing and running a catastrophic-forgetting audit on a production fine-tune is exactly the day-one work of an AI safety researcher in any LLM-shipping organization.

This challenge sharpens

catastrophic-forgetting
llm-evaluation
benchmarking

ML Researcher

Running mitigations like replay or model-merging and honestly reporting whether they close the gap is core ML-research work in industry labs.

This challenge sharpens

model-merging
fine-tuning
llm-evaluation

Machine Learning Engineer

Building a reproducible LLM evaluation harness that another engineer can rerun is the MLE craft of shipping evaluation as code.

This challenge sharpens

llm-evaluation
huggingface
benchmarking

One more thing

You can put a credential on your CV by Friday.

Start this challenge