Tune a PPO Policy for an Energy-Storage Trading Bot

FreeVerified credential3 weeksAdvanced

Overview

What this challenge is about.

You receive 18 months of 15-minute Nordic spot-price data, a battery dynamics model (capacity, round-trip efficiency, degradation curve), and a rule-based baseline that earns about EUR 42/MWh/day. Train a PPO policy in a custom Gym environment to choose charge/discharge actions on a continuous power setpoint. Backtest the trained policy on a held-out 3-month period, report mean daily profit, profit volatility, max drawdown, and percentage of degradation budget consumed. Write a 2-page memo for the trading desk explicitly addressing the risk of overfitting to the training period.

CredentialBlockchain-anchored

ShareableLinkedIn-ready

LanguageEnglish

PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Train and backtest a PPO bidding policy for grid-scale battery storage and quantify whether it beats the rule-based baseline net of degradation and risk.

Earning criteria — what you'll demonstrate

Implement and tune Proximal Policy Optimization on a continuous-control problem
Design a realistic RL environment around a physical system with degradation costs
Backtest a learned policy with held-out time periods to detect overfitting
Communicate RL results to a non-ML quant audience

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Deep Reinforcement Learning

Master · Ai Ml

Fit score: 1

Skills

Skills you'll demonstrate.

Each one shows up on your verified credential.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Career paths this builds toward

Canonical roles

Applied AI Scientist
AI Research

Applied AI Scientist

Coupling RL training with rigorous backtests and a trader-facing memo is core applied-AI-scientist work at any quant or climate-tech firm.

This challenge sharpens

ppo
backtesting
risk-analysis

ML Researcher

Designing a faithful RL environment around a physical system with degradation costs is the kind of problem ML researchers tackle in industry research labs.

This challenge sharpens

policy-gradients
environment-design
reinforcement-learning

Data Scientist

Walk-forward evaluation and overfitting analysis on time-series data is the data-scientist craft that transfers to any forecasting or trading role.

This challenge sharpens

backtesting
risk-analysis
environment-design

One more thing

You can put a credential on your CV by Friday.

Start this challenge