Towards Safe Policy Improvement for Non-Stationary MDPs
- URL: http://arxiv.org/abs/2010.12645v2
- Date: Thu, 17 Dec 2020 20:26:20 GMT
- Title: Towards Safe Policy Improvement for Non-Stationary MDPs
- Authors: Yash Chandak, Scott M. Jordan, Georgios Theocharous, Martha White,
Philip S. Thomas
- Abstract summary: Many real-world problems of interest exhibit non-stationarity, and when stakes are high, the cost associated with a false stationarity assumption may be unacceptable.
We take the first steps towards ensuring safety, with high confidence, for smoothly-varying non-stationary decision problems.
Our proposed method extends a type of safe algorithm, called a Seldonian algorithm, through a synthesis of model-free reinforcement learning with time-series analysis.
- Score: 48.9966576179679
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many real-world sequential decision-making problems involve critical systems
with financial risks and human-life risks. While several works in the past have
proposed methods that are safe for deployment, they assume that the underlying
problem is stationary. However, many real-world problems of interest exhibit
non-stationarity, and when stakes are high, the cost associated with a false
stationarity assumption may be unacceptable. We take the first steps towards
ensuring safety, with high confidence, for smoothly-varying non-stationary
decision problems. Our proposed method extends a type of safe algorithm, called
a Seldonian algorithm, through a synthesis of model-free reinforcement learning
with time-series analysis. Safety is ensured using sequential hypothesis
testing of a policy's forecasted performance, and confidence intervals are
obtained using wild bootstrap.
Related papers
- Safe Reinforcement Learning for Constrained Markov Decision Processes with Stochastic Stopping Time [0.6554326244334868]
We present an online reinforcement learning algorithm for constrained Markov decision processes with a safety constraint.
We show that the learned policy is safe with high confidence.
We also demonstrate that efficient exploration can be achieved by defining a subset of the state-space called proxy set.
arXiv Detail & Related papers (2024-03-23T20:22:30Z) - Information-Theoretic Safe Bayesian Optimization [59.758009422067005]
We consider a sequential decision making task, where the goal is to optimize an unknown function without evaluating parameters that violate an unknown (safety) constraint.
Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case.
We propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate.
arXiv Detail & Related papers (2024-02-23T14:31:10Z) - One step closer to unbiased aleatoric uncertainty estimation [71.55174353766289]
We propose a new estimation method by actively de-noising the observed data.
By conducting a broad range of experiments, we demonstrate that our proposed approach provides a much closer approximation to the actual data uncertainty than the standard method.
arXiv Detail & Related papers (2023-12-16T14:59:11Z) - Information-Theoretic Safe Exploration with Gaussian Processes [89.31922008981735]
We consider a sequential decision making task where we are not allowed to evaluate parameters that violate an unknown (safety) constraint.
Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case.
We propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate.
arXiv Detail & Related papers (2022-12-09T15:23:58Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Context-Aware Safe Reinforcement Learning for Non-Stationary
Environments [24.75527261989899]
Safety is a critical concern when deploying reinforcement learning agents for realistic tasks.
We propose the context-aware safe reinforcement learning (CASRL) method to realize safe adaptation in non-stationary environments.
Results show that the proposed algorithm significantly outperforms existing baselines in terms of safety and robustness.
arXiv Detail & Related papers (2021-01-02T23:52:22Z) - Efficient falsification approach for autonomous vehicle validation using
a parameter optimisation technique based on reinforcement learning [6.198523595657983]
The widescale deployment of Autonomous Vehicles (AV) appears to be imminent despite many safety challenges that are yet to be resolved.
The uncertainties in the behaviour of the traffic participants and the dynamic world cause reactions in advanced autonomous systems.
This paper presents an efficient falsification method to evaluate the System Under Test.
arXiv Detail & Related papers (2020-11-16T02:56:13Z) - Verifiably Safe Exploration for End-to-End Reinforcement Learning [17.401496872603943]
This paper contributes a first approach toward enforcing formal safety constraints on end-to-end policies with visual inputs.
It is evaluated on a novel benchmark that emphasizes the challenge of safely exploring in the presence of hard constraints.
arXiv Detail & Related papers (2020-07-02T16:12:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.