Compare Kernel SVMs and Gradient Boosting on Imbalanced Tabular Data
Overview
What this challenge is about.
You receive a 220,000-row anonymized loan-default dataset with mixed numeric and categorical features and a ~6% positive class. Train and evaluate (1) an RBF-kernel SVM with proper hyperparameter search, (2) a gradient-boosting baseline (LightGBM or XGBoost). Use nested cross-validation for the model-selection comparison, calibrate both models, and report AUROC, average precision, ECE (Expected Calibration Error), training time, and inference latency at p99. Defend a single recommendation in a 3-page memo for the head of engineering.
The Brief
What you'll do, and what you'll demonstrate.
Decide whether kernel SVMs still belong in the production credit-risk model zoo or whether gradient boosting wins on every axis that matters.
Earning criteria — what you'll demonstrate
- Apply nested cross-validation for honest model selection
- Compare kernel methods against tree-ensemble methods on real tabular data
- Quantify probability calibration with ECE and reliability diagrams
- Defend a model choice across statistical, calibration, and operational axes
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
ML Researcher
Rigorous model-selection comparisons with nested CV and calibrated probability outputs are the bread-and-butter of ML research roles at any quant or fintech AI team.
This challenge sharpens
- kernel-methods
- nested-cross-validation
- model-selection
Applied AI Scientist
Trading off accuracy, calibration, and operational latency for a real engineering decision is exactly the applied-AI-scientist's daily reality.
This challenge sharpens
- model-selection
- model-calibration
- gradient-boosting
Data Scientist
Honest reporting of all axes (not just the winning metric) is what hiring managers look for in senior data-scientist candidates.
This challenge sharpens
- model-selection
- model-calibration
- gradient-boosting