Our results show that Gnu-RL saved 16.7% of cooling demand compared to the existing controller and tracked temperature set-point better. Next, Gnu-RL was deployed to control the HVAC of a real-world conference room for a three-week period. In the simulation experiment, our approach saved 6.6% energy compared to the best published RL result for the same environment, while maintaining a higher level of occupant comfort.
In both experiments, our agents were directly de- ployed in the environment after offline pre-training on expert demonstration. We evaluate Gnu-RL on both an EnergyPlus model and a real- world testbed. Once it is put in charge of controlling the environment, the agent continues to improve its policy end-to-end, using a policy gradient algorithm. Prior to any interaction with the environment, a Gnu-RL agent is pre-trained on historical data using imitation learning, which enables it to match the behavior of the existing controller. To achieve this, Gnu-RL adopts a recently- developed Differentiable Model Predictive Control (MPC) policy, which encodes domain knowledge on planning and system dynam- ics, making it both sample-efficient and interpretable. To tackle these challenges, we propose Gnu-RL: a novel approach that enables practical more » deployment of RL for HVAC control and re- quires no prior information other than historical data from exist- ing HVAC controllers. Likewise, existing RL agents generally take a long time to learn and are opaque to expert interrogation, making them unattractive for real-world deployment. While one can train an RL agent in simulation, it is not cost-effective to create a model for each thermal zone or building. However, there has been limited progress towards a practical and scalable RL solution for HVAC control. Reinforcement learning (RL) was first demonstrated to be a feasible approach to controlling heating, ventilation, and air conditioning (HVAC) systems more than a decade ago.