Higher-Order Action Regularization in Deep Reinforcement Learning: From Continuous Control to Building Energy Management
- URL: http://arxiv.org/abs/2601.02061v1
- Date: Mon, 05 Jan 2026 12:35:33 GMT
- Title: Higher-Order Action Regularization in Deep Reinforcement Learning: From Continuous Control to Building Energy Management
- Authors: Faizan Ahmed, Aniket Dixit, James Brusey,
- Abstract summary: We systematically investigate action smoothness regularization through higher-order derivative penalties.<n>Our work establishes higher-order action regularization as an effective bridge between RL optimization and operational constraints in energy-critical applications.
- Score: 1.3891530345631953
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep reinforcement learning agents often exhibit erratic, high-frequency control behaviors that hinder real-world deployment due to excessive energy consumption and mechanical wear. We systematically investigate action smoothness regularization through higher-order derivative penalties, progressing from theoretical understanding in continuous control benchmarks to practical validation in building energy management. Our comprehensive evaluation across four continuous control environments demonstrates that third-order derivative penalties (jerk minimization) consistently achieve superior smoothness while maintaining competitive performance. We extend these findings to HVAC control systems where smooth policies reduce equipment switching by 60%, translating to significant operational benefits. Our work establishes higher-order action regularization as an effective bridge between RL optimization and operational constraints in energy-critical applications.
Related papers
- Control of Rayleigh-Bénard Convection: Effectiveness of Reinforcement Learning in the Turbulent Regime [6.619254876970774]
We study the effectiveness of Reinforcement Learning (RL) for reducing convective heat transfer under increasing turbulence.<n>RL agents trained via single-agent Proximal Policy Optimization (PPO) are compared to linear proportional derivative (PD) controllers.<n>The RL agents reduced convection, measured by the Nusselt Number, by up to 33% in moderately turbulent systems and 10% in highly turbulent settings.
arXiv Detail & Related papers (2025-04-16T11:51:59Z) - Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution [51.83951489847344]
In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency.
In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution.
Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
arXiv Detail & Related papers (2024-04-05T17:58:37Z) - A Safe Reinforcement Learning Algorithm for Supervisory Control of Power
Plants [7.1771300511732585]
Model-free Reinforcement learning (RL) has emerged as a promising solution for control tasks.
We propose a chance-constrained RL algorithm based on Proximal Policy Optimization for supervisory control.
Our approach achieves the smallest distance of violation and violation rate in a load-follow maneuver for an advanced Nuclear Power Plant design.
arXiv Detail & Related papers (2024-01-23T17:52:49Z) - Hybrid Reinforcement Learning for Optimizing Pump Sustainability in
Real-World Water Distribution Networks [55.591662978280894]
This article addresses the pump-scheduling optimization problem to enhance real-time control of real-world water distribution networks (WDNs)
Our primary objectives are to adhere to physical operational constraints while reducing energy consumption and operational costs.
Traditional optimization techniques, such as evolution-based and genetic algorithms, often fall short due to their lack of convergence guarantees.
arXiv Detail & Related papers (2023-10-13T21:26:16Z) - Efficient Deep Reinforcement Learning Requires Regulating Overfitting [91.88004732618381]
We show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms.
We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.
arXiv Detail & Related papers (2023-04-20T17:11:05Z) - Optimizing Industrial HVAC Systems with Hierarchical Reinforcement
Learning [1.7489518849687256]
Reinforcement learning techniques have been developed to optimize industrial cooling systems, offering substantial energy savings.
A major challenge in industrial control involves learning behaviors that are feasible in the real world due to machinery constraints.
We use hierarchical reinforcement learning with multiple agents that control subsets of actions according to their operation time scales.
arXiv Detail & Related papers (2022-09-16T18:00:46Z) - Is Bang-Bang Control All You Need? Solving Continuous Control with
Bernoulli Policies [45.20170713261535]
We investigate the phenomenon that trained agents often prefer actions at the boundaries of that space.
We replace the normal Gaussian by a Bernoulli distribution that solely considers the extremes along each action dimension.
Surprisingly, this achieves state-of-the-art performance on several continuous control benchmarks.
arXiv Detail & Related papers (2021-11-03T22:45:55Z) - Enforcing Policy Feasibility Constraints through Differentiable
Projection for Energy Optimization [57.88118988775461]
We propose PROjected Feasibility (PROF) to enforce convex operational constraints within neural policies.
We demonstrate PROF on two applications: energy-efficient building operation and inverter control.
arXiv Detail & Related papers (2021-05-19T01:58:10Z) - Regularizing Action Policies for Smooth Control with Reinforcement
Learning [47.312768123967025]
Conditioning for Action Policy Smoothness (CAPS) is an effective yet intuitive regularization on action policies.
CAPS offers consistent improvement in the smoothness of the learned state-to-action mappings of neural network controllers.
Tested on a real system, improvements in controller smoothness on a quadrotor drone resulted in an almost 80% reduction in power consumption.
arXiv Detail & Related papers (2020-12-11T21:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.