Skip to contentSkip to content
Verified credentials. On-chain. Forever.Learn more
Cover image for Use Actor-Critic to Auto-Tune a HVAC Control Policy
Code

Use Actor-Critic to Auto-Tune a HVAC Control Policy

FreeVerified credential3 weeksAdvanced

Overview

What this challenge is about.

You receive a Sinergym wrapper around the EnergyPlus model of one floor with 8 thermal zones, weather data for one year, and occupancy schedules. Train a Soft Actor-Critic (SAC, a continuous-control off-policy actor-critic algorithm) on temperature setpoints with a reward combining energy use and a comfort penalty (predicted-mean-vote bounds). Evaluate over 4 held-out seasons; report kWh saved vs. rule-based, comfort violation minutes per week, and policy stability across seeds. Write a safety memo explaining failure modes and proposing guard rails for a pilot.

CredentialBlockchain-anchored
ShareableLinkedIn-ready
LanguageEnglish
PaceSelf-paced

The Brief

What you'll do, and what you'll demonstrate.

Train a SAC HVAC policy that beats the rule-based controller on energy use while never violating occupant comfort bounds, and propose pilot guard rails.

Earning criteria — what you'll demonstrate

  • Implement and tune Soft Actor-Critic for continuous control
  • Design a constrained reward balancing energy and comfort
  • Evaluate policies seasonally on held-out weather
  • Translate RL safety considerations into operational guard rails

Program Fit

Where this fits in your program.

Sharpens the same skills your degree expects you to demonstrate.

Careers

Roles this prepares you for.

Real titles. Real skill bridges. Pick the one closest to your trajectory.

Machine Learning Engineer

Training a deep RL controller against a real simulator with strict safety bounds is the kind of system MLEs ship in industrial and building-controls companies.

This challenge sharpens

  • soft-actor-critic
  • continuous-control
  • simulation

AI Safety Researcher

Designing constrained rewards, failure-mode analyses, and operational guard rails for a learned controller is exactly the day-one work of AI safety researchers in applied settings.

This challenge sharpens

  • safety-constraints
  • actor-critic
  • reinforcement-learning

Applied AI Scientist

Translating a research-grade SAC training run into a pilot-ready memo with quantified guard rails is core applied-AI-scientist work.

This challenge sharpens

  • soft-actor-critic
  • safety-constraints
  • simulation

One more thing

You can put a credential on your CV by Friday.

Use Actor-Critic to Auto-Tune a HVAC Control Policy | Ewance Challenge