Safe Exploration Method for Reinforcement Learning under Existence of
Disturbance
- URL: http://arxiv.org/abs/2209.15452v2
- Date: Mon, 20 Mar 2023 06:46:22 GMT
- Title: Safe Exploration Method for Reinforcement Learning under Existence of
Disturbance
- Authors: Yoshihiro Okawa, Tomotake Sasaki, Hitoshi Yanami, Toru Namerikawa
- Abstract summary: We deal with a safe exploration problem in reinforcement learning under the existence of disturbance.
We propose a safe exploration method that uses partial prior knowledge of a controlled object and disturbance.
We illustrate the validity and effectiveness of the proposed method through numerical simulations of an inverted pendulum and a four-bar parallel link robot manipulator.
- Score: 1.1470070927586016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent rapid developments in reinforcement learning algorithms have been
giving us novel possibilities in many fields. However, due to their exploring
property, we have to take the risk into consideration when we apply those
algorithms to safety-critical problems especially in real environments. In this
study, we deal with a safe exploration problem in reinforcement learning under
the existence of disturbance. We define the safety during learning as
satisfaction of the constraint conditions explicitly defined in terms of the
state and propose a safe exploration method that uses partial prior knowledge
of a controlled object and disturbance. The proposed method assures the
satisfaction of the explicit state constraints with a pre-specified probability
even if the controlled object is exposed to a stochastic disturbance following
a normal distribution. As theoretical results, we introduce sufficient
conditions to construct conservative inputs not containing an exploring aspect
used in the proposed method and prove that the safety in the above explained
sense is guaranteed with the proposed method. Furthermore, we illustrate the
validity and effectiveness of the proposed method through numerical simulations
of an inverted pendulum and a four-bar parallel link robot manipulator.
Related papers
- Safe Reinforcement Learning for Constrained Markov Decision Processes with Stochastic Stopping Time [0.6554326244334868]
We present an online reinforcement learning algorithm for constrained Markov decision processes with a safety constraint.
We show that the learned policy is safe with high confidence.
We also demonstrate that efficient exploration can be achieved by defining a subset of the state-space called proxy set.
arXiv Detail & Related papers (2024-03-23T20:22:30Z) - Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL)
We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z) - Probabilistic Counterexample Guidance for Safer Reinforcement Learning
(Extended Version) [1.279257604152629]
Safe exploration aims at addressing the limitations of Reinforcement Learning (RL) in safety-critical scenarios.
Several methods exist to incorporate external knowledge or to use sensor data to limit the exploration of unsafe states.
In this paper, we target the problem of safe exploration by guiding the training with counterexamples of the safety requirement.
arXiv Detail & Related papers (2023-07-10T22:28:33Z) - Approximate Shielding of Atari Agents for Safe Exploration [83.55437924143615]
We propose a principled algorithm for safe exploration based on the concept of shielding.
We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations.
arXiv Detail & Related papers (2023-04-21T16:19:54Z) - Information-Theoretic Safe Exploration with Gaussian Processes [89.31922008981735]
We consider a sequential decision making task where we are not allowed to evaluate parameters that violate an unknown (safety) constraint.
Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case.
We propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate.
arXiv Detail & Related papers (2022-12-09T15:23:58Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Automatic Exploration Process Adjustment for Safe Reinforcement Learning
with Joint Chance Constraint Satisfaction [2.127049691404299]
We propose an automatic exploration process adjustment method for safe reinforcement learning algorithms.
Our proposed method automatically selects whether the exploratory input is used or not at each time depending on the state and its predicted value.
Our method theoretically guarantees the satisfaction of the constraints with the pre-specified probability, that is, the satisfaction of a joint chance constraint at every time.
arXiv Detail & Related papers (2021-03-05T13:30:53Z) - Context-Aware Safe Reinforcement Learning for Non-Stationary
Environments [24.75527261989899]
Safety is a critical concern when deploying reinforcement learning agents for realistic tasks.
We propose the context-aware safe reinforcement learning (CASRL) method to realize safe adaptation in non-stationary environments.
Results show that the proposed algorithm significantly outperforms existing baselines in terms of safety and robustness.
arXiv Detail & Related papers (2021-01-02T23:52:22Z) - Towards Safe Policy Improvement for Non-Stationary MDPs [48.9966576179679]
Many real-world problems of interest exhibit non-stationarity, and when stakes are high, the cost associated with a false stationarity assumption may be unacceptable.
We take the first steps towards ensuring safety, with high confidence, for smoothly-varying non-stationary decision problems.
Our proposed method extends a type of safe algorithm, called a Seldonian algorithm, through a synthesis of model-free reinforcement learning with time-series analysis.
arXiv Detail & Related papers (2020-10-23T20:13:51Z) - Provably Safe PAC-MDP Exploration Using Analogies [87.41775218021044]
Key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration and safety.
We propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, dynamics.
Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense.
arXiv Detail & Related papers (2020-07-07T15:50:50Z) - Safe reinforcement learning for probabilistic reachability and safety
specifications: A Lyapunov-based approach [2.741266294612776]
We propose a model-free safety specification method that learns the maximal probability of safe operation.
Our approach constructs a Lyapunov function with respect to a safe policy to restrain each policy improvement stage.
It yields a sequence of safe policies that determine the range of safe operation, called the safe set.
arXiv Detail & Related papers (2020-02-24T09:20:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.