Postmortem and Action-Item Tracking for a 3-Service Outage
Overview
What this challenge is about.
Re-write the existing postmortem using the Google SRE blameless format: timeline (anchored to UTC), what went well, what went poorly, where we got lucky, action items (S.M.A.R.T. with named owners and review dates). Re-triage the 9 action items: kill the vague ones, split the giant ones, assign owners. Design a 'postmortem follow-through' system: weekly review by EM + on-call lead, kill criteria for stuck items, exec-level dashboard. Deliver: rewritten postmortem, re-triaged action-item list, 5-page playbook, dashboard spec.
The Brief
What you'll do, and what you'll demonstrate.
Rewrite a blame-tinged postmortem to blameless standards, salvage its 9 stuck action items, and install a postmortem-follow-through system the team will actually honor.
Earning criteria — what you'll demonstrate
- Rewrite a blame-tinged postmortem to blameless evidence-based standards
- Triage action items with kill / split / assign criteria
- Design a follow-through system so action items don't decay
- Communicate postmortem culture changes to a sceptical engineering team
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Career mappings coming soon.