Long-term Safe Reinforcement Learning with Binary Feedback
- URL: http://arxiv.org/abs/2401.03786v2
- Date: Thu, 11 Jan 2024 11:59:25 GMT
- Title: Long-term Safe Reinforcement Learning with Binary Feedback
- Authors: Akifumi Wachi, Wataru Hashimoto, Kazumune Hashimoto
- Abstract summary: Long-term Binary Safe RL (LoBiSaRL) is a safe RL algorithm for constrained Markov decision processes.
LoBiSaRL guarantees the long-term safety constraint, with high probability.
Our theoretical results show that LoBiSaRL guarantees the long-term safety constraint, with high probability.
- Score: 5.684409853507594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Safety is an indispensable requirement for applying reinforcement learning
(RL) to real problems. Although there has been a surge of safe RL algorithms
proposed in recent years, most existing work typically 1) relies on receiving
numeric safety feedback; 2) does not guarantee safety during the learning
process; 3) limits the problem to a priori known, deterministic transition
dynamics; and/or 4) assume the existence of a known safe policy for any states.
Addressing the issues mentioned above, we thus propose Long-term Binaryfeedback
Safe RL (LoBiSaRL), a safe RL algorithm for constrained Markov decision
processes (CMDPs) with binary safety feedback and an unknown, stochastic state
transition function. LoBiSaRL optimizes a policy to maximize rewards while
guaranteeing a long-term safety that an agent executes only safe state-action
pairs throughout each episode with high probability. Specifically, LoBiSaRL
models the binary safety function via a generalized linear model (GLM) and
conservatively takes only a safe action at every time step while inferring its
effect on future safety under proper assumptions. Our theoretical results show
that LoBiSaRL guarantees the long-term safety constraint, with high
probability. Finally, our empirical results demonstrate that our algorithm is
safer than existing methods without significantly compromising performance in
terms of reward.
Related papers
- ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning [48.536695794883826]
We present ActSafe, a novel model-based RL algorithm for safe and efficient exploration.
We show that ActSafe guarantees safety during learning while also obtaining a near-optimal policy in finite time.
In addition, we propose a practical variant of ActSafe that builds on latest model-based RL advancements.
arXiv Detail & Related papers (2024-10-12T10:46:02Z) - Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning [57.84059344739159]
"Shielding" is a popular technique to enforce safety inReinforcement Learning (RL)
We propose a new permissibility-based framework to deal with safety and shield construction.
arXiv Detail & Related papers (2024-05-29T18:00:21Z) - Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints [15.904640266226023]
We design a safety model that performs credit assignment to assess contributions of partial state-action trajectories on safety.
We derive an effective algorithm for optimizing a safe policy using the learned safety model.
We devise a method to dynamically adapt the tradeoff coefficient between safety reward and safety compliance.
arXiv Detail & Related papers (2024-05-05T17:27:22Z) - Implicit Safe Set Algorithm for Provably Safe Reinforcement Learning [7.349727826230864]
We present a model-free safe control algorithm, the implicit safe set algorithm, for synthesizing safeguards for DRL agents.
The proposed algorithm synthesizes a safety index (barrier certificate) and a subsequent safe control law solely by querying a black-box dynamic function.
We validate the proposed algorithm on the state-of-the-art Safety Gym benchmark, where it achieves zero safety violations while gaining $95% pm 9%$ cumulative reward.
arXiv Detail & Related papers (2024-05-04T20:59:06Z) - Safe Exploration in Reinforcement Learning: A Generalized Formulation
and Algorithms [8.789204441461678]
We present a solution of the safe exploration (GSE) problem in the form of a meta-algorithm for safe exploration, MASE.
Our proposed algorithm achieves better performance than state-of-the-art algorithms on grid-world and Safety Gym benchmarks.
arXiv Detail & Related papers (2023-10-05T00:47:09Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Online Shielding for Reinforcement Learning [59.86192283565134]
We propose an approach for online safety shielding of RL agents.
During runtime, the shield analyses the safety of each available action.
Based on this probability and a given threshold, the shield decides whether to block an action from the agent.
arXiv Detail & Related papers (2022-12-04T16:00:29Z) - Provable Safe Reinforcement Learning with Binary Feedback [62.257383728544006]
We consider the problem of provable safe RL when given access to an offline oracle providing binary feedback on the safety of state, action pairs.
We provide a novel meta algorithm, SABRE, which can be applied to any MDP setting given access to a blackbox PAC RL algorithm for that setting.
arXiv Detail & Related papers (2022-10-26T05:37:51Z) - Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and
Benchmarking [12.719948223824483]
reinforcement learning (RL) algorithms are crucial to unlock their potential for many real-world tasks.
However, vanilla RL and most safe RL approaches do not guarantee safety.
We introduce a categorization of existing provably safe RL methods, present the conceptual foundations for both continuous and discrete action spaces, and empirically benchmark existing methods.
We provide practical guidance on selecting provably safe RL approaches depending on the safety specification, RL algorithm, and type of action space.
arXiv Detail & Related papers (2022-05-13T16:34:36Z) - SAUTE RL: Almost Surely Safe Reinforcement Learning Using State
Augmentation [63.25418599322092]
Satisfying safety constraints almost surely (or with probability one) can be critical for deployment of Reinforcement Learning (RL) in real-life applications.
We address the problem by introducing Safety Augmented Markov Decision Processes (MDPs)
We show that Saute MDP allows to view Safe augmentation problem from a different perspective enabling new features.
arXiv Detail & Related papers (2022-02-14T08:57:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.