Spectral Clustering for Customer Segmentation at a SaaS
Overview
What this challenge is about.
Receive a 12,000 customer × 220 feature usage matrix (counts per feature per week, averaged over 12 weeks). Construct a similarity graph (k-nearest-neighbors with k=15, Gaussian-RBF weights), compute the normalized Laplacian, extract the top-k eigenvectors, run k-means in the embedded space. Pick k via the eigengap heuristic + silhouette scoring. Compare to k-means and DBSCAN on the raw matrix using NMI on a hand-labeled 200-customer test set. Profile sparse-matrix performance. Deliver code, cluster interpretations, and a 6-page recommendation memo.
The Brief
What you'll do, and what you'll demonstrate.
Apply spectral clustering to a 12k-customer usage matrix and decide whether it beats simpler baselines enough to justify the operational complexity.
Earning criteria — what you'll demonstrate
- Implement spectral clustering from a similarity graph end-to-end
- Use the eigengap heuristic to choose cluster count
- Benchmark against simpler baselines using a labeled test set
- Recommend (or not) a more complex algorithm based on real lift
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Product Manager
Technical product managers who can review a clustering analysis and push back on weak baselines build trust with their data engineers.
This challenge sharpens
- algorithm-analysis
- benchmarking
- data-structures