Overview
What this challenge is about.
You receive a Sinergym wrapper around the EnergyPlus model of one floor with 8 thermal zones, weather data for one year, and occupancy schedules. Train a Soft Actor-Critic (SAC, a continuous-control off-policy actor-critic algorithm) on temperature setpoints with a reward combining energy use and a comfort penalty (predicted-mean-vote bounds). Evaluate over 4 held-out seasons; report kWh saved vs. rule-based, comfort violation minutes per week, and policy stability across seeds. Write a safety memo explaining failure modes and proposing guard rails for a pilot.
The Brief
What you'll do, and what you'll demonstrate.
Train a SAC HVAC policy that beats the rule-based controller on energy use while never violating occupant comfort bounds, and propose pilot guard rails.
Earning criteria — what you'll demonstrate
- Implement and tune Soft Actor-Critic for continuous control
- Design a constrained reward balancing energy and comfort
- Evaluate policies seasonally on held-out weather
- Translate RL safety considerations into operational guard rails
Program Fit
Where this fits in your program.
Sharpens the same skills your degree expects you to demonstrate.
Skills
Skills you'll demonstrate.
Each one shows up on your verified credential.
Careers
Roles this prepares you for.
Real titles. Real skill bridges. Pick the one closest to your trajectory.
Machine Learning Engineer
Training a deep RL controller against a real simulator with strict safety bounds is the kind of system MLEs ship in industrial and building-controls companies.
This challenge sharpens
- soft-actor-critic
- continuous-control
- simulation
AI Safety Researcher
Designing constrained rewards, failure-mode analyses, and operational guard rails for a learned controller is exactly the day-one work of AI safety researchers in applied settings.
This challenge sharpens
- safety-constraints
- actor-critic
- reinforcement-learning
Applied AI Scientist
Translating a research-grade SAC training run into a pilot-ready memo with quantified guard rails is core applied-AI-scientist work.
This challenge sharpens
- soft-actor-critic
- safety-constraints
- simulation