Compare Kernel Methods to Trees on a Genomics Classification Task
Overview
What this challenge is about.
You receive a curated benchmark of about 12,000 labeled variants with ~120 numerical + ~40 string features. Fit kernel SVMs (RBF, polynomial, string), random forest, and XGBoost with proper nested cross-validation. Report ROC-AUC, balanced accuracy, training/inference cost, and interpretability. Identify regimes (sample size, feature type) where each family wins. Deliver the experiment notebook, a results table, and a 5-page methodology draft suitable for an internal lab seminar.
The Brief
What you'll do, and what you'll demonstrate.
Compare kernel methods to tree ensembles on a genomics classification benchmark, with regimes where each family wins documented.
Earning criteria — what you'll demonstrate
- Apply kernel methods (including string kernels) on a real benchmark
- Run nested cross-validation correctly to avoid optimism
- Compare model families across accuracy, cost, and interpretability
- Document regimes where each method family wins
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
ML Researcher
Disciplined head-to-head model-family comparisons with regime analysis is exactly the kind of work that lands at NeurIPS-adjacent workshops and is core ML-researcher craft.
This challenge sharpens
- kernel-methods
- tree-ensembles
- benchmarking
Research Scientist
Nested cross-validation and regime documentation is the rigor expected from a junior research scientist in a bio-ML lab.
This challenge sharpens
- nested-cross-validation
- kernel-methods
- benchmarking
Data Scientist
Translating a model-family bake-off into actionable lab guidance is a senior data-science responsibility.
This challenge sharpens
- svm
- tree-ensembles
- nested-cross-validation