Kernel Methods vs. Deep Learning on a Tiny-Data Drug-Discovery Task
Overview
What this challenge is about.
You receive (or download) 3 public ADMET datasets from MoleculeNet (e.g., BBBP, Lipophilicity, FreeSolv). For each, train both: (a) a Gaussian process with a Tanimoto kernel over Morgan fingerprints, (b) a small graph neural network (DimeNet or MPNN, 2-3 message-passing layers) under matched training budgets. Use nested cross-validation. Report RMSE/AUC per dataset with confidence intervals, plus calibration error on the GP. The memo should recommend which method to default to as a function of dataset size and chemistry complexity.
The Brief
What you'll do, and what you'll demonstrate.
Determine whether kernel methods or graph neural networks should be the team's default for small-molecule property prediction on tiny datasets.
Earning criteria — what you'll demonstrate
- Apply kernel methods (GPs) to a non-trivial real-world task
- Implement chemistry-aware kernels (Tanimoto over Morgan fingerprints)
- Compare kernel methods to graph neural networks on equal footing
- Reason about the small-data regime where kernel methods win
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Career paths this builds toward
Canonical rolesML Researcher
Head-to-head methodology comparison on the small-data regime is the canonical applied-ML research project, especially valuable in pharma/biotech.
This challenge sharpens
- kernel-methods
- graph-neural-networks
- cross-validation
Research Scientist
Scaffold-split protocol design and calibration analysis are the rigor markers research scientists are hired against.
This challenge sharpens
- gaussian-processes
- calibration
- cross-validation