Provable Safe Reinforcement Learning with Binary Feedback
- URL: http://arxiv.org/abs/2210.14492v1
- Date: Wed, 26 Oct 2022 05:37:51 GMT
- Title: Provable Safe Reinforcement Learning with Binary Feedback
- Authors: Andrew Bennett, Dipendra Misra, Nathan Kallus
- Abstract summary: We consider the problem of provable safe RL when given access to an offline oracle providing binary feedback on the safety of state, action pairs.
We provide a novel meta algorithm, SABRE, which can be applied to any MDP setting given access to a blackbox PAC RL algorithm for that setting.
- Score: 62.257383728544006
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Safety is a crucial necessity in many applications of reinforcement learning
(RL), whether robotic, automotive, or medical. Many existing approaches to safe
RL rely on receiving numeric safety feedback, but in many cases this feedback
can only take binary values; that is, whether an action in a given state is
safe or unsafe. This is particularly true when feedback comes from human
experts. We therefore consider the problem of provable safe RL when given
access to an offline oracle providing binary feedback on the safety of state,
action pairs. We provide a novel meta algorithm, SABRE, which can be applied to
any MDP setting given access to a blackbox PAC RL algorithm for that setting.
SABRE applies concepts from active learning to reinforcement learning to
provably control the number of queries to the safety oracle. SABRE works by
iteratively exploring the state space to find regions where the agent is
currently uncertain about safety. Our main theoretical results shows that,
under appropriate technical assumptions, SABRE never takes unsafe actions
during training, and is guaranteed to return a near-optimal safe policy with
high probability. We provide a discussion of how our meta-algorithm may be
applied to various settings studied in both theoretical and empirical
frameworks.
Related papers
- ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning [48.536695794883826]
We present ActSafe, a novel model-based RL algorithm for safe and efficient exploration.
We show that ActSafe guarantees safety during learning while also obtaining a near-optimal policy in finite time.
In addition, we propose a practical variant of ActSafe that builds on latest model-based RL advancements.
arXiv Detail & Related papers (2024-10-12T10:46:02Z) - Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding [5.5929450570003185]
Training RL agents in unknown, black-box environments poses an even greater safety risk when prior knowledge of the domain/task is unavailable.
We introduce ADVICE (Adaptive Shielding with a Contrastive Autoencoder), a novel post-shielding technique that distinguishes safe and unsafe features of state-action pairs during training.
arXiv Detail & Related papers (2024-05-28T13:47:21Z) - Long-term Safe Reinforcement Learning with Binary Feedback [5.684409853507594]
Long-term Binary Safe RL (LoBiSaRL) is a safe RL algorithm for constrained Markov decision processes.
LoBiSaRL guarantees the long-term safety constraint, with high probability.
Our theoretical results show that LoBiSaRL guarantees the long-term safety constraint, with high probability.
arXiv Detail & Related papers (2024-01-08T10:07:31Z) - Safe Reinforcement Learning in a Simulated Robotic Arm [0.0]
Reinforcement learning (RL) agents need to explore their environments in order to learn optimal policies.
In this paper, we extend the applicability of safe RL algorithms by creating a customized environment with Panda robotic arm.
arXiv Detail & Related papers (2023-11-28T19:22:16Z) - OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning
Research [3.0536277689386453]
We introduce a foundational framework designed to expedite SafeRL research endeavors.
Our framework encompasses an array of algorithms spanning different RL domains and places heavy emphasis on safety elements.
arXiv Detail & Related papers (2023-05-16T09:22:14Z) - A Multiplicative Value Function for Safe and Efficient Reinforcement
Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic.
The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns.
We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and
Benchmarking [12.719948223824483]
reinforcement learning (RL) algorithms are crucial to unlock their potential for many real-world tasks.
However, vanilla RL and most safe RL approaches do not guarantee safety.
We introduce a categorization of existing provably safe RL methods, present the conceptual foundations for both continuous and discrete action spaces, and empirically benchmark existing methods.
We provide practical guidance on selecting provably safe RL approaches depending on the safety specification, RL algorithm, and type of action space.
arXiv Detail & Related papers (2022-05-13T16:34:36Z) - Learning to be Safe: Deep RL with a Safety Critic [72.00568333130391]
A natural first approach toward safe RL is to manually specify constraints on the policy's behavior.
We propose to learn how to be safe in one set of tasks and environments, and then use that learned intuition to constrain future behaviors.
arXiv Detail & Related papers (2020-10-27T20:53:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.