Provably Safe PAC-MDP Exploration Using Analogies
- URL: http://arxiv.org/abs/2007.03574v2
- Date: Mon, 22 Mar 2021 14:28:54 GMT
- Title: Provably Safe PAC-MDP Exploration Using Analogies
- Authors: Melrose Roderick, Vaishnavh Nagarajan, J. Zico Kolter
- Abstract summary: Key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration and safety.
We propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, dynamics.
Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense.
- Score: 87.41775218021044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key challenge in applying reinforcement learning to safety-critical domains
is understanding how to balance exploration (needed to attain good performance
on the task) with safety (needed to avoid catastrophic failure). Although a
growing line of work in reinforcement learning has investigated this area of
"safe exploration," most existing techniques either 1) do not guarantee safety
during the actual exploration process; and/or 2) limit the problem to a priori
known and/or deterministic transition dynamics with strong smoothness
assumptions. Addressing this gap, we propose Analogous Safe-state Exploration
(ASE), an algorithm for provably safe exploration in MDPs with unknown,
stochastic dynamics. Our method exploits analogies between state-action pairs
to safely learn a near-optimal policy in a PAC-MDP sense. Additionally, ASE
also guides exploration towards the most task-relevant states, which
empirically results in significant improvements in terms of sample efficiency,
when compared to existing methods.
Related papers
- Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL)
We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z) - Safe Exploration in Reinforcement Learning: A Generalized Formulation
and Algorithms [8.789204441461678]
We present a solution of the safe exploration (GSE) problem in the form of a meta-algorithm for safe exploration, MASE.
Our proposed algorithm achieves better performance than state-of-the-art algorithms on grid-world and Safety Gym benchmarks.
arXiv Detail & Related papers (2023-10-05T00:47:09Z) - Probabilistic Counterexample Guidance for Safer Reinforcement Learning
(Extended Version) [1.279257604152629]
Safe exploration aims at addressing the limitations of Reinforcement Learning (RL) in safety-critical scenarios.
Several methods exist to incorporate external knowledge or to use sensor data to limit the exploration of unsafe states.
In this paper, we target the problem of safe exploration by guiding the training with counterexamples of the safety requirement.
arXiv Detail & Related papers (2023-07-10T22:28:33Z) - Safe Reinforcement Learning with Dead-Ends Avoidance and Recovery [13.333197887318168]
Safety is one of the main challenges in applying reinforcement learning to realistic environmental tasks.
We propose a method to construct a boundary that discriminates safe and unsafe states.
Our approach has better task performance with less safety violations than state-of-the-art algorithms.
arXiv Detail & Related papers (2023-06-24T12:02:50Z) - Approximate Shielding of Atari Agents for Safe Exploration [83.55437924143615]
We propose a principled algorithm for safe exploration based on the concept of shielding.
We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations.
arXiv Detail & Related papers (2023-04-21T16:19:54Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Safe Exploration Method for Reinforcement Learning under Existence of
Disturbance [1.1470070927586016]
We deal with a safe exploration problem in reinforcement learning under the existence of disturbance.
We propose a safe exploration method that uses partial prior knowledge of a controlled object and disturbance.
We illustrate the validity and effectiveness of the proposed method through numerical simulations of an inverted pendulum and a four-bar parallel link robot manipulator.
arXiv Detail & Related papers (2022-09-30T13:00:33Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Conservative Safety Critics for Exploration [120.73241848565449]
We study the problem of safe exploration in reinforcement learning (RL)
We learn a conservative safety estimate of environment states through a critic.
We show that the proposed approach can achieve competitive task performance while incurring significantly lower catastrophic failure rates.
arXiv Detail & Related papers (2020-10-27T17:54:25Z) - Safe Reinforcement Learning in Constrained Markov Decision Processes [20.175139766171277]
We propose an algorithm, SNO-MDP, that explores and optimize Markov decision processes under unknown safety constraints.
We provide theoretical guarantees on both the satisfaction of the safety constraint and the near-optimality of the cumulative reward.
arXiv Detail & Related papers (2020-08-15T02:20:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.