Frequent-Itemset Mining on a Grocery Retailer's Basket History
Overview
What this challenge is about.
Load 18 months of basket-level transaction data (Parquet, around 92 GB) into a Spark cluster. Run FP-growth at support thresholds tuned per category (food vs household vs fresh). Filter for itemsets where the lift versus baseline co-occurrence is above 1.4 and the pair has been stable across at least 9 of 18 months. Propose 10 shelf-adjacency changes with expected basket-size lift, ranked by ease of physical implementation. Deliver an analysis notebook, an 8-page recommendation memo, and a follow-up A/B test plan for 6 stores.
The Brief
What you'll do, and what you'll demonstrate.
Mine 240 million baskets to propose 10 evidence-backed shelf-adjacency changes with expected basket-size lift and a follow-up A/B test plan.
Earning criteria — what you'll demonstrate
- Run FP-growth at scale on real basket data, not toy market-basket examples
- Tune support thresholds per category to avoid drowning in trivial itemsets
- Distinguish lift from raw co-occurrence and pick stable patterns
- Translate itemset patterns into operational store-layout decisions
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Product Manager
Category-management PMs who can read mining output unlock evidence-based merchandising decisions.
This challenge sharpens
- lift-analysis
- ab-testing
- data-storytelling