Safe Reinforcement Learning by Imagining the Near Future
- URL: http://arxiv.org/abs/2202.07789v1
- Date: Tue, 15 Feb 2022 23:28:24 GMT
- Title: Safe Reinforcement Learning by Imagining the Near Future
- Authors: Garrett Thomas, Yuping Luo, Tengyu Ma
- Abstract summary: In this work, we focus on the setting where unsafe states can be avoided by planning ahead a short time into the future.
We devise a model-based algorithm that heavily penalizes unsafe trajectories, and derive guarantees that our algorithm can avoid unsafe states under certain assumptions.
Experiments demonstrate that our algorithm can achieve competitive rewards with fewer safety violations in several continuous control tasks.
- Score: 37.0376099401243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Safe reinforcement learning is a promising path toward applying reinforcement
learning algorithms to real-world problems, where suboptimal behaviors may lead
to actual negative consequences. In this work, we focus on the setting where
unsafe states can be avoided by planning ahead a short time into the future. In
this setting, a model-based agent with a sufficiently accurate model can avoid
unsafe states. We devise a model-based algorithm that heavily penalizes unsafe
trajectories, and derive guarantees that our algorithm can avoid unsafe states
under certain assumptions. Experiments demonstrate that our algorithm can
achieve competitive rewards with fewer safety violations in several continuous
control tasks.
Related papers
- Safe Reinforcement Learning for Constrained Markov Decision Processes with Stochastic Stopping Time [0.6554326244334868]
We present an online reinforcement learning algorithm for constrained Markov decision processes with a safety constraint.
We show that the learned policy is safe with high confidence.
We also demonstrate that efficient exploration can be achieved by defining a subset of the state-space called proxy set.
arXiv Detail & Related papers (2024-03-23T20:22:30Z) - Safety Margins for Reinforcement Learning [53.10194953873209]
We show how to leverage proxy criticality metrics to generate safety margins.
We evaluate our approach on learned policies from APE-X and A3C within an Atari environment.
arXiv Detail & Related papers (2023-07-25T16:49:54Z) - Approximate Shielding of Atari Agents for Safe Exploration [83.55437924143615]
We propose a principled algorithm for safe exploration based on the concept of shielding.
We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations.
arXiv Detail & Related papers (2023-04-21T16:19:54Z) - Safe Deep Reinforcement Learning by Verifying Task-Level Properties [84.64203221849648]
Cost functions are commonly employed in Safe Deep Reinforcement Learning (DRL)
The cost is typically encoded as an indicator function due to the difficulty of quantifying the risk of policy decisions in the state space.
In this paper, we investigate an alternative approach that uses domain knowledge to quantify the risk in the proximity of such states by defining a violation metric.
arXiv Detail & Related papers (2023-02-20T15:24:06Z) - ProBF: Learning Probabilistic Safety Certificates with Barrier Functions [31.203344483485843]
The control barrier function is a useful tool to guarantee safety if we have access to the ground-truth system dynamics.
In practice, we have inaccurate knowledge of the system dynamics, which can lead to unsafe behaviors.
We show the efficacy of this method through experiments on Segway and Quadrotor simulations.
arXiv Detail & Related papers (2021-12-22T20:18:18Z) - Learn Zero-Constraint-Violation Policy in Model-Free Constrained
Reinforcement Learning [7.138691584246846]
We propose the safe set actor-critic (SSAC) algorithm, which confines the policy update using safety-oriented energy functions.
The safety index is designed to increase rapidly for potentially dangerous actions.
We claim that we can learn the energy function in a model-free manner similar to learning a value function.
arXiv Detail & Related papers (2021-11-25T07:24:30Z) - Sample-Efficient Safety Assurances using Conformal Prediction [57.92013073974406]
Early warning systems can provide alerts when an unsafe situation is imminent.
To reliably improve safety, these warning systems should have a provable false negative rate.
We present a framework that combines a statistical inference technique known as conformal prediction with a simulator of robot/environment dynamics.
arXiv Detail & Related papers (2021-09-28T23:00:30Z) - Learning to Act Safely with Limited Exposure and Almost Sure Certainty [1.0323063834827415]
This paper aims to put forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for exploratory trials.
We first focus on the canonical multi-armed bandit problem and seek to study the intrinsic trade-offs of learning safety in the presence of uncertainty.
arXiv Detail & Related papers (2021-05-18T18:05:12Z) - Learning to be safe, in finite time [4.189643331553922]
This paper aims to put forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for an unbounded number of exploratory trials.
We focus on the canonical multi-armed bandit problem and seek to study the exploration-preservation trade-off intrinsic within safe learning.
arXiv Detail & Related papers (2020-10-01T14:03:34Z) - Chance-Constrained Trajectory Optimization for Safe Exploration and
Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training.
We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.