An adaptive safety layer with hard constraints for safe reinforcement
learning in multi-energy management systems
- URL: http://arxiv.org/abs/2304.08897v3
- Date: Mon, 6 Nov 2023 08:27:14 GMT
- Title: An adaptive safety layer with hard constraints for safe reinforcement
learning in multi-energy management systems
- Authors: Glenn Ceusters, Muhammad Andy Putratama, R\"udiger Franke, Ann Now\'e,
Maarten Messagie
- Abstract summary: Safe reinforcement learning with hard constraint guarantees is a promising optimal control direction for multi-energy management systems.
We present two novel advancements: (I) combining the OptLayer and SafeFallback method, named OptLayerPolicy, to increase the initial utility.
We have shown that, in a simulated multi-energy system case study, the initial utility is increased to 92.4% (OptLayerPolicy) compared to 86.1% (OptLayer) and that the policy after training is increased to 104.9% (GreyOptLayerPolicy) compared to 103.4% (OptLayer)
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Safe reinforcement learning (RL) with hard constraint guarantees is a
promising optimal control direction for multi-energy management systems. It
only requires the environment-specific constraint functions itself a priori and
not a complete model. The project-specific upfront and ongoing engineering
efforts are therefore still reduced, better representations of the underlying
system dynamics can still be learnt, and modelling bias is kept to a minimum.
However, even the constraint functions alone are not always trivial to
accurately provide in advance, leading to potentially unsafe behaviour. In this
paper, we present two novel advancements: (I) combining the OptLayer and
SafeFallback method, named OptLayerPolicy, to increase the initial utility
while keeping a high sample efficiency and the possibility to formulate
equality constraints. (II) introducing self-improving hard constraints, to
increase the accuracy of the constraint functions as more and new data becomes
available so that better policies can be learnt. Both advancements keep the
constraint formulation decoupled from the RL formulation, so new (presumably
better) RL algorithms can act as drop-in replacements. We have shown that, in a
simulated multi-energy system case study, the initial utility is increased to
92.4% (OptLayerPolicy) compared to 86.1% (OptLayer) and that the policy after
training is increased to 104.9% (GreyOptLayerPolicy) compared to 103.4%
(OptLayer) - all relative to a vanilla RL benchmark. Although introducing
surrogate functions into the optimisation problem requires special attention,
we conclude that the newly presented GreyOptLayerPolicy method is the most
advantageous.
Related papers
- Offline RL via Feature-Occupancy Gradient Ascent [9.983014605039658]
We study offline Reinforcement Learning in large infinite-horizon discounted Markov Decision Processes (MDPs)
We develop a new algorithm that performs a form of gradient ascent in the space of feature occupancies.
We show that the resulting simple algorithm satisfies strong computational and sample complexity guarantees.
arXiv Detail & Related papers (2024-05-22T15:39:05Z) - REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.
In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.
We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage [100.8180383245813]
We propose value-based algorithms for offline reinforcement learning (RL)
We show an analogous result for vanilla Q-functions under a soft margin condition.
Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.
arXiv Detail & Related papers (2023-02-05T14:22:41Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Model-based Safe Deep Reinforcement Learning via a Constrained Proximal
Policy Optimization Algorithm [4.128216503196621]
We propose an On-policy Model-based Safe Deep RL algorithm in which we learn the transition dynamics of the environment in an online manner.
We show that our algorithm is more sample efficient and results in lower cumulative hazard violations as compared to constrained model-free approaches.
arXiv Detail & Related papers (2022-10-14T06:53:02Z) - Safe reinforcement learning for multi-energy management systems with
known constraint functions [0.0]
Reinforcement learning (RL) is a promising optimal control technique for multi-energy management systems.
We present two novel safe RL methods, namely SafeFallback and GiveSafe.
In a simulated multi-energy systems case study we have shown that both methods start with a significantly higher utility.
arXiv Detail & Related papers (2022-07-08T11:33:53Z) - Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks [59.419152768018506]
We show that any optimal policy necessarily satisfies the k-SP constraint.
We propose a novel cost function that penalizes the policy violating SP constraint, instead of completely excluding it.
Our experiments on MiniGrid, DeepMind Lab, Atari, and Fetch show that the proposed method significantly improves proximal policy optimization (PPO)
arXiv Detail & Related papers (2021-07-13T21:39:21Z) - Enforcing Policy Feasibility Constraints through Differentiable
Projection for Energy Optimization [57.88118988775461]
We propose PROjected Feasibility (PROF) to enforce convex operational constraints within neural policies.
We demonstrate PROF on two applications: energy-efficient building operation and inverter control.
arXiv Detail & Related papers (2021-05-19T01:58:10Z) - First Order Constrained Optimization in Policy Space [19.00289722198614]
We propose a novel approach called First Order Constrained Optimization in Policy Space (FOCOPS)
FOCOPS maximizes an agent's overall reward while ensuring the agent satisfies a set of cost constraints.
We provide empirical evidence that our simple approach achieves better performance on a set of constrained robotics locomotive tasks.
arXiv Detail & Related papers (2020-02-16T05:07:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.