SAAC: Safe Reinforcement Learning as an Adversarial Game of
Actor-Critics
- URL: http://arxiv.org/abs/2204.09424v1
- Date: Wed, 20 Apr 2022 12:32:33 GMT
- Title: SAAC: Safe Reinforcement Learning as an Adversarial Game of
Actor-Critics
- Authors: Yannis Flet-Berliac and Debabrota Basu
- Abstract summary: We develop a safe adversarially guided soft actor-critic framework called SAAC.
In SAAC, the adversary aims to break the safety constraint while the RL agent aims to maximize the constrained value function.
We show that SAAC achieves faster convergence, better efficiency, and fewer failures to satisfy the safety constraints.
- Score: 11.132587007566329
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Although Reinforcement Learning (RL) is effective for sequential
decision-making problems under uncertainty, it still fails to thrive in
real-world systems where risk or safety is a binding constraint. In this paper,
we formulate the RL problem with safety constraints as a non-zero-sum game.
While deployed with maximum entropy RL, this formulation leads to a safe
adversarially guided soft actor-critic framework, called SAAC. In SAAC, the
adversary aims to break the safety constraint while the RL agent aims to
maximize the constrained value function given the adversary's policy. The
safety constraint on the agent's value function manifests only as a repulsion
term between the agent's and the adversary's policies. Unlike previous
approaches, SAAC can address different safety criteria such as safe
exploration, mean-variance risk sensitivity, and CVaR-like coherent risk
sensitivity. We illustrate the design of the adversary for these constraints.
Then, in each of these variations, we show the agent differentiates itself from
the adversary's unsafe actions in addition to learning to solve the task.
Finally, for challenging continuous control tasks, we demonstrate that SAAC
achieves faster convergence, better efficiency, and fewer failures to satisfy
the safety constraints than risk-averse distributional RL and risk-neutral soft
actor-critic algorithms.
Related papers
- Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical
Systems [15.863561935347692]
We develop provably safe and convergent reinforcement learning algorithms for control of nonlinear dynamical systems.
Recent advances at the intersection of control and RL follow a two-stage, safety filter approach to enforcing hard safety constraints.
We develop a single-stage, sampling-based approach to hard constraint satisfaction that learns RL controllers enjoying classical convergence guarantees.
arXiv Detail & Related papers (2024-03-06T19:39:20Z) - Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL)
We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z) - Robust Safe Reinforcement Learning under Adversarial Disturbances [12.145611442959602]
Safety is a primary concern when applying reinforcement learning to real-world control tasks.
Existing safe reinforcement learning algorithms rarely account for external disturbances.
This paper proposes a robust safe reinforcement learning framework that tackles worst-case disturbances.
arXiv Detail & Related papers (2023-10-11T05:34:46Z) - Safe Reinforcement Learning with Dual Robustness [10.455148541147796]
Reinforcement learning (RL) agents are vulnerable to adversarial disturbances.
We propose a systematic framework to unify safe RL and robust RL.
We also design a deep RL algorithm for practical implementation, called dually robust actor-critic (DRAC)
arXiv Detail & Related papers (2023-09-13T09:34:21Z) - Safety Margins for Reinforcement Learning [53.10194953873209]
We show how to leverage proxy criticality metrics to generate safety margins.
We evaluate our approach on learned policies from APE-X and A3C within an Atari environment.
arXiv Detail & Related papers (2023-07-25T16:49:54Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement
Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy.
Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z) - Safe Reinforcement Learning Using Advantage-Based Intervention [45.79740561754542]
Many sequential decision problems involve finding a policy that maximizes total reward while obeying safety constraints.
We propose a new algorithm, SAILR, that uses an intervention mechanism based on advantage functions to keep the agent safe throughout training.
Our method comes with strong guarantees on safety during both training and deployment.
arXiv Detail & Related papers (2021-06-16T20:28:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.