Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis
- URL: http://arxiv.org/abs/2312.11314v1
- Date: Mon, 18 Dec 2023 16:09:43 GMT
- Title: Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis
- Authors: Rohan Mitta, Hosein Hasanbeig, Jun Wang, Daniel Kroening, Yiannis
Kantaros, Alessandro Abate
- Abstract summary: This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL)
We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
- Score: 63.532413807686524
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the problem of maintaining safety during training in
Reinforcement Learning (RL), such that the safety constraint violations are
bounded at any point during learning. In a variety of RL applications the
safety of the agent is particularly important, e.g. autonomous platforms or
robots that work in proximity of humans. As enforcing safety during training
might severely limit the agent's exploration, we propose here a new
architecture that handles the trade-off between efficient progress and safety
during exploration. As the exploration progresses, we update via Bayesian
inference Dirichlet-Categorical models of the transition probabilities of the
Markov decision process that describes the environment dynamics. This paper
proposes a way to approximate moments of belief about the risk associated to
the action selection policy. We construct those approximations, and prove the
convergence results. We propose a novel method for leveraging the expectation
approximations to derive an approximate bound on the confidence that the risk
is below a certain level. This approach can be easily interleaved with RL and
we present experimental results to showcase the performance of the overall
architecture.
Related papers
- State-Wise Safe Reinforcement Learning With Pixel Observations [12.338614299403305]
We propose a novel pixel-observation safe RL algorithm that efficiently encodes state-wise safety constraints with unknown hazard regions.
As a joint learning framework, our approach begins by constructing a latent dynamics model with low-dimensional latent spaces derived from pixel observations.
We then build and learn a latent barrier-like function on top of the latent dynamics and conduct policy optimization simultaneously, thereby improving both safety and the total expected return.
arXiv Detail & Related papers (2023-11-03T20:32:30Z) - Probabilistic Counterexample Guidance for Safer Reinforcement Learning
(Extended Version) [1.279257604152629]
Safe exploration aims at addressing the limitations of Reinforcement Learning (RL) in safety-critical scenarios.
Several methods exist to incorporate external knowledge or to use sensor data to limit the exploration of unsafe states.
In this paper, we target the problem of safe exploration by guiding the training with counterexamples of the safety requirement.
arXiv Detail & Related papers (2023-07-10T22:28:33Z) - Approximate Shielding of Atari Agents for Safe Exploration [83.55437924143615]
We propose a principled algorithm for safe exploration based on the concept of shielding.
We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations.
arXiv Detail & Related papers (2023-04-21T16:19:54Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Lyapunov-based uncertainty-aware safe reinforcement learning [0.0]
InReinforcement learning (RL) has shown a promising performance in learning optimal policies for a variety of sequential decision-making tasks.
In many real-world RL problems, besides optimizing the main objectives, the agent is expected to satisfy a certain level of safety.
We propose a Lyapunov-based uncertainty-aware safe RL model to address these limitations.
arXiv Detail & Related papers (2021-07-29T13:08:15Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z) - Provably Safe PAC-MDP Exploration Using Analogies [87.41775218021044]
Key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration and safety.
We propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, dynamics.
Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense.
arXiv Detail & Related papers (2020-07-07T15:50:50Z) - Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process.
Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z) - Safe reinforcement learning for probabilistic reachability and safety
specifications: A Lyapunov-based approach [2.741266294612776]
We propose a model-free safety specification method that learns the maximal probability of safe operation.
Our approach constructs a Lyapunov function with respect to a safe policy to restrain each policy improvement stage.
It yields a sequence of safe policies that determine the range of safe operation, called the safe set.
arXiv Detail & Related papers (2020-02-24T09:20:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.