Compare MDP Solvers for a Smart-Grid Battery Dispatch Pilot
Overview
What this challenge is about.
Model home-battery dispatch as a finite MDP: state is (state-of-charge, hour-of-day, current price tier), actions are charge/hold/discharge with realistic efficiency losses, transitions follow a stationary hourly-price process estimated from public data, reward is negative electricity cost. Implement value iteration, policy iteration, and tabular Q-learning. Compare convergence speed, achieved expected daily savings, and sensitivity to discount factor. Document where each method shines and where it breaks, and recommend one for the team to operationalize. The pilot needs a method the team can defend to its first paying utility customer.
The Brief
What you'll do, and what you'll demonstrate.
Solve a smart-grid battery dispatch MDP three ways and recommend one method for the team's roadmap.
Earning criteria — what you'll demonstrate
- Formulate a real-world control problem as an MDP
- Implement and contrast classical and learning-based MDP solvers
- Reason about convergence and sample efficiency trade-offs
- Write a technical comparison for a graduate-level audience
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Applied AI Scientist
Translating a control problem into an MDP and choosing a method defensibly is core applied-AI work in any energy-tech team.
This challenge sharpens
- markov-decision-processes
- reinforcement-learning
- value-iteration
ML Researcher
Comparing classical and learning-based MDP solvers with honest convergence analysis is the literal practice ML researchers cut their teeth on.
This challenge sharpens
- q-learning
- value-iteration
- policy-iteration
Machine Learning Engineer
Reproducible, defensible, well-instrumented experiments are the MLE's deliverable when productionizing an RL method.
This challenge sharpens
- python
- q-learning
- reinforcement-learning