Drift-Detect + Self-Heal a Multi-Tenant Kubernetes Estate
Overview
What this challenge is about.
Enable ArgoCD auto-sync with selfHeal + prune across all 28 clusters. Add Kyverno cluster policies enforcing baseline standards (registry, resource limits, network policy). Build a drift dashboard (Grafana) showing per-cluster drift events + auto-heal counts. Define an exception process: a 'frozen' annotation per customer-specific manual fix, with a deadline for proper resolution. Apply to all 28 clusters. Deliver ArgoCD + Kyverno configs, the Grafana dashboard, and a 10-page writeup including the customer-comms plan for the drift cleanup.
The Brief
What you'll do, and what you'll demonstrate.
Roll out drift detection + auto-heal across 28 multi-tenant clusters with Kyverno policy enforcement and a drift-tracking dashboard the platform team can act on.
Earning criteria — what you'll demonstrate
- Apply ArgoCD self-heal at multi-tenant scale safely
- Use Kyverno policies for baseline enforcement across cluster fleets
- Design an exception process that gives manual fixes a deadline
- Communicate platform standardization to customer-success teams
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Career mappings coming soon.