Gradient Shaping for Multi-Constraint Safe Reinforcement Learning
- URL: http://arxiv.org/abs/2312.15127v1
- Date: Sat, 23 Dec 2023 00:55:09 GMT
- Title: Gradient Shaping for Multi-Constraint Safe Reinforcement Learning
- Authors: Yihang Yao, Zuxin Liu, Zhepeng Cen, Peide Huang, Tingnan Zhang, Wenhao
Yu, Ding Zhao
- Abstract summary: Online safe reinforcement learning (RL) involves training a policy that maximizes task efficiency while satisfying constraints via interacting with the environments.
We propose a unified framework designed for MC safe RL algorithms.
We introduce the Gradient Shaping (GradS) method for general Lagrangian-based safe RL algorithms to improve the training efficiency in terms of both reward and constraint satisfaction.
- Score: 31.297400160104853
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online safe reinforcement learning (RL) involves training a policy that
maximizes task efficiency while satisfying constraints via interacting with the
environments. In this paper, our focus lies in addressing the complex
challenges associated with solving multi-constraint (MC) safe RL problems. We
approach the safe RL problem from the perspective of Multi-Objective
Optimization (MOO) and propose a unified framework designed for MC safe RL
algorithms. This framework highlights the manipulation of gradients derived
from constraints. Leveraging insights from this framework and recognizing the
significance of \textit{redundant} and \textit{conflicting} constraint
conditions, we introduce the Gradient Shaping (GradS) method for general
Lagrangian-based safe RL algorithms to improve the training efficiency in terms
of both reward and constraint satisfaction. Our extensive experimentation
demonstrates the effectiveness of our proposed method in encouraging
exploration and learning a policy that improves both safety and reward
performance across various challenging MC safe RL tasks as well as good
scalability to the number of constraints.
Related papers
- Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning [26.244121960815907]
We propose a primal-based framework that orchestrates policy optimization between multi-objective learning and constraint adherence.
Our method employs a novel natural policy gradient manipulation method to optimize multiple RL objectives.
Empirically, our proposed method also outperforms prior state-of-the-art methods on challenging safe multi-objective reinforcement learning tasks.
arXiv Detail & Related papers (2024-05-26T00:42:10Z) - Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - A Survey of Constraint Formulations in Safe Reinforcement Learning [15.593999581562203]
Safety is critical when applying reinforcement learning to real-world problems.
A prevalent safe RL approach is based on a constrained criterion, which seeks to maximize the expected cumulative reward.
Despite recent effort to enhance safety in RL, a systematic understanding of the field remains difficult.
arXiv Detail & Related papers (2024-02-03T04:40:31Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Constrained Decision Transformer for Offline Safe Reinforcement Learning [16.485325576173427]
We study the offline safe RL problem from a novel multi-objective optimization perspective.
We propose the constrained decision transformer (CDT) approach, which can dynamically adjust the trade-offs during deployment.
arXiv Detail & Related papers (2023-02-14T21:27:10Z) - State-wise Safe Reinforcement Learning: A Survey [5.826308050755618]
State-wise constraints are one of the most common constraints in real-world applications.
This paper provides a review of existing approaches that address state-wise constraints in RL.
arXiv Detail & Related papers (2023-02-06T21:11:29Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Reinforcement Learning with Stepwise Fairness Constraints [50.538878453547966]
We introduce the study of reinforcement learning with stepwise fairness constraints.
We provide learning algorithms with strong theoretical guarantees in regard to policy optimality and fairness violation.
arXiv Detail & Related papers (2022-11-08T04:06:23Z) - Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement
Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy.
Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z) - Constrained Markov Decision Processes via Backward Value Functions [43.649330976089004]
We model the problem of learning with constraints as a Constrained Markov Decision Process.
A key contribution of our approach is to translate cumulative cost constraints into state-based constraints.
We provide theoretical guarantees under which the agent converges while ensuring safety over the course of training.
arXiv Detail & Related papers (2020-08-26T20:56:16Z) - Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process.
Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.