Reproducible Patient-Cohort Analysis for a Pharma AI Vendor
Overview
What this challenge is about.
You receive a written cohort definition (type-2 diabetes patients on metformin for at least 90 days, aged 40-70) and a target output: 12-month HbA1c change distribution plus a Kaplan-Meier curve for time-to-treatment-intensification. Use the public Synthea-generated synthetic dataset (no real patient data, fully accessible). Build the analysis with a clear data-access script, environment lockfile, and a one-command run that produces identical outputs on any machine. Document the cohort-definition decisions an auditor would scrutinize.
The Brief
What you'll do, and what you'll demonstrate.
Rebuild a diabetes-cohort outcome analysis as a fully reproducible, auditable artifact on a public synthetic dataset.
Earning criteria — what you'll demonstrate
- Implement a reproducible analysis pipeline from raw data to chart
- Apply standard epidemiological methods (cohort definition, Kaplan-Meier)
- Document analytical choices to a level an external auditor accepts
- Use synthetic data responsibly to mirror a real clinical workflow
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Data Scientist
Reproducible cohort + survival analysis on patient-like data is exactly the portfolio piece a junior data scientist at a pharma-AI or healthtech vendor wants to show.
This challenge sharpens
- cohort-analysis
- survival-analysis
- reproducible-analysis
Applied AI Scientist
Auditable analyses with rigorous decision logs are how applied AI scientists earn trust with regulated-industry stakeholders.
This challenge sharpens
- reproducible-analysis
- documentation
- cohort-analysis
MLOps Engineer
Environment pinning, one-command runs, and audit trails are the same disciplines MLOps engineers apply to production model pipelines.
This challenge sharpens
- reproducible-analysis
- documentation
- python