Drug-Repurposing Candidate Screen with Embedding Similarity
Overview
What this challenge is about.
You receive (1) a list of 15 known therapeutic candidates (SMILES + ChEMBL identifiers) for a single rare disease, (2) a database of about 4,500 marketed drugs (SMILES + ATC codes). Compute molecular embeddings using two methods (Morgan fingerprints + ChemBERTa-style learned embeddings). For each method, rank marketed drugs by similarity to the known-candidate centroid. Produce a top-50 shortlist, annotate each with primary ATC class and known indications. Discuss what the two methods agree and disagree on. Deliver a 4-page memo for the medicinal-chemistry team.
The Brief
What you'll do, and what you'll demonstrate.
Build a two-method computational drug-repurposing screen and deliver an annotated top-50 shortlist medicinal chemists can discuss.
Earning criteria — what you'll demonstrate
- Apply molecular embeddings (chemo-informatic + neural) to a real screening question
- Run a centroid-based similarity ranking and reason about its assumptions
- Compare classical and neural embedding methods on the same task
- Frame computational-screen output respectfully for medicinal chemists
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Applied AI Scientist
Computational-screen pipelines with chemistry-team-readable outputs are the applied-AI-scientist's daily work at any AI-forward drug-discovery startup.
This challenge sharpens
- molecular-embeddings
- similarity-search
- transformer
ML Researcher
Comparing classical chemoinformatic and learned neural embeddings on the same screening task is the kind of focused ML-research study small biotech labs value.
This challenge sharpens
- molecular-embeddings
- transfer-learning
- transformer
Data Scientist
Pairing a similarity pipeline with a respectful, chemist-readable memo is exactly the cross-functional data-scientist work biotechs hire for.
This challenge sharpens
- similarity-search
- exploratory-data-analysis
- molecular-embeddings