Skip to contentSkip to content
Verified credentials. On-chain. Forever.Learn more
Cover image for Catastrophic-Forgetting Audit on a Domain Fine-Tune
Analysis

Catastrophic-Forgetting Audit on a Domain Fine-Tune

FreeVerified credential2 weeksAdvanced

Overview

What this challenge is about.

You receive the fine-tuned 7B chemistry model and its base, plus a benchmark basket (MMLU subset, GSM8K, IFEval, a small instruction-following set). Run all 4 benchmarks on both models with seed-averaging. Identify the largest regression, design and run a small mitigation experiment (data-replay mix during fine-tuning OR LoRA-merge with the base) on a small subset, and report whether the mitigation closes the gap. Write a 3-page safety memo with concrete recommendations for the next fine-tune cycle.

CredentialBlockchain-anchored
ShareableLinkedIn-ready
LanguageEnglish
PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Audit a domain fine-tuned LLM for catastrophic forgetting, propose mitigations, and write the safety memo that informs the next fine-tune cycle.

Earning criteria — what you'll demonstrate

  • Design a catastrophic-forgetting audit for a domain fine-tune
  • Run multi-benchmark LLM evaluation with statistical rigor
  • Reason about mitigations (replay, merging, LoRA isolation)
  • Communicate safety findings to platform leadership

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

AI Safety Researcher

Designing and running a catastrophic-forgetting audit on a production fine-tune is exactly the day-one work of an AI safety researcher in any LLM-shipping organization.

This challenge sharpens

  • catastrophic-forgetting
  • llm-evaluation
  • benchmarking

ML Researcher

Running mitigations like replay or model-merging and honestly reporting whether they close the gap is core ML-research work in industry labs.

This challenge sharpens

  • model-merging
  • fine-tuning
  • llm-evaluation

Machine Learning Engineer

Building a reproducible LLM evaluation harness that another engineer can rerun is the MLE craft of shipping evaluation as code.

This challenge sharpens

  • llm-evaluation
  • huggingface
  • benchmarking

One more thing

You can put a credential on your CV by Friday.

Catastrophic-Forgetting Audit on a Domain Fine-Tune | Ewance Challenge