Re-Architect a Data-Center Fabric for an AI-Training Workload
Overview
What this challenge is about.
Receive the current 3-tier topology diagram, traffic captures from the existing fabric (4 weeks of sFlow), and the rack-level requirements for the 4 new H100 racks (200 Gbps per server, RoCEv2). Design a non-blocking leaf-spine Clos for 32 leaf + 8 spine switches with 200G spine-leaf links. Choose between BGP EVPN with VXLAN overlay vs. plain eBGP-unnumbered with ECMP, and defend the choice in writing. Build a Containerlab simulation of a 4-leaf + 2-spine slice running the chosen routing on FRR, demonstrate ECMP load-balancing with iperf3 and prove convergence under a single-link failure under 200ms. Deliver topology diagrams, routing configs, the Containerlab repo, a benchmark report, and a 6-page design memo for the network architect to sign off.
The Brief
What you'll do, and what you'll demonstrate.
Design a leaf-spine datacenter fabric that handles AI-training east-west traffic, justify the routing-protocol choice, and prove it on a Containerlab simulation before the capex commit.
Earning criteria — what you'll demonstrate
- Apply Clos fabric design to a realistic AI-training datacenter scenario
- Compare BGP EVPN/VXLAN vs. eBGP-unnumbered against concrete requirements
- Validate routing convergence and ECMP behavior in simulation
- Write a design memo a network architect will actually sign
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Career mappings coming soon.