Distributionally Safe Reinforcement Learning under Model Uncertainty: A
Single-Level Approach by Differentiable Convex Programming
- URL: http://arxiv.org/abs/2310.02459v1
- Date: Tue, 3 Oct 2023 22:05:05 GMT
- Title: Distributionally Safe Reinforcement Learning under Model Uncertainty: A
Single-Level Approach by Differentiable Convex Programming
- Authors: Alaa Eddine Chriat and Chuangchuang Sun
- Abstract summary: We present a tractable distributionally safe reinforcement learning framework to enforce safety under a distributional shift measured by a Wasserstein metric.
To improve the tractability, we first use duality theory to transform the lower-level optimization from infinite-dimensional probability space to a finite-dimensional parametric space.
By differentiable convex programming, the bi-level safe learning problem is further reduced to a single-level one with two sequential computationally efficient modules.
- Score: 4.825619788907192
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Safety assurance is uncompromisable for safety-critical environments with the
presence of drastic model uncertainties (e.g., distributional shift),
especially with humans in the loop. However, incorporating uncertainty in safe
learning will naturally lead to a bi-level problem, where at the lower level
the (worst-case) safety constraint is evaluated within the uncertainty
ambiguity set. In this paper, we present a tractable distributionally safe
reinforcement learning framework to enforce safety under a distributional shift
measured by a Wasserstein metric. To improve the tractability, we first use
duality theory to transform the lower-level optimization from
infinite-dimensional probability space where distributional shift is measured,
to a finite-dimensional parametric space. Moreover, by differentiable convex
programming, the bi-level safe learning problem is further reduced to a
single-level one with two sequential computationally efficient modules: a
convex quadratic program to guarantee safety followed by a projected gradient
ascent to simultaneously find the worst-case uncertainty. This end-to-end
differentiable framework with safety constraints, to the best of our knowledge,
is the first tractable single-level solution to address distributional safety.
We test our approach on first and second-order systems with varying
complexities and compare our results with the uncertainty-agnostic policies,
where our approach demonstrates a significant improvement on safety guarantees.
Related papers
- Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints [15.904640266226023]
We design a safety model that performs credit assignment to assess contributions of partial state-action trajectories on safety.
We derive an effective algorithm for optimizing a safe policy using the learned safety model.
We devise a method to dynamically adapt the tradeoff coefficient between safety reward and safety compliance.
arXiv Detail & Related papers (2024-05-05T17:27:22Z) - Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applications [73.58451824894568]
The widely adopted CMDP model constrains the risks in expectation, which makes room for dangerous behaviors in long-tail states.
In safety-critical domains, such behaviors could lead to disastrous outcomes.
We propose Objective Suppression, a novel method that adaptively suppresses the task reward maximizing objectives according to a safety critic.
arXiv Detail & Related papers (2024-02-23T23:22:06Z) - Robust Safe Reinforcement Learning under Adversarial Disturbances [12.145611442959602]
Safety is a primary concern when applying reinforcement learning to real-world control tasks.
Existing safe reinforcement learning algorithms rarely account for external disturbances.
This paper proposes a robust safe reinforcement learning framework that tackles worst-case disturbances.
arXiv Detail & Related papers (2023-10-11T05:34:46Z) - Risk-Averse Model Uncertainty for Distributionally Robust Safe
Reinforcement Learning [3.9821399546174825]
We introduce a deep reinforcement learning framework for safe decision making in uncertain environments.
We provide robustness guarantees for this framework by showing it is equivalent to a specific class of distributionally robust safe reinforcement learning problems.
In experiments on continuous control tasks with safety constraints, we demonstrate that our framework produces robust performance and safety at deployment time across a range of perturbed test environments.
arXiv Detail & Related papers (2023-01-30T00:37:06Z) - Meta-Learning Priors for Safe Bayesian Optimization [72.8349503901712]
We build on a meta-learning algorithm, F-PACOH, capable of providing reliable uncertainty quantification in settings of data scarcity.
As core contribution, we develop a novel framework for choosing safety-compliant priors in a data-riven manner.
On benchmark functions and a high-precision motion system, we demonstrate that our meta-learned priors accelerate the convergence of safe BO approaches.
arXiv Detail & Related papers (2022-10-03T08:38:38Z) - Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement
Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy.
Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z) - Recursively Feasible Probabilistic Safe Online Learning with Control Barrier Functions [60.26921219698514]
We introduce a model-uncertainty-aware reformulation of CBF-based safety-critical controllers.
We then present the pointwise feasibility conditions of the resulting safety controller.
We use these conditions to devise an event-triggered online data collection strategy.
arXiv Detail & Related papers (2022-08-23T05:02:09Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Safe Learning of Uncertain Environments for Nonlinear Control-Affine
Systems [10.918870296899245]
We consider the problem of safe learning in nonlinear control-affine systems subject to unknown additive uncertainty.
We model uncertainty as a Gaussian signal and use state measurements to learn its mean and covariance bounds.
We show that with an arbitrarily large probability we can guarantee that the state will remain in the safe set, while learning and control are carried out simultaneously.
arXiv Detail & Related papers (2021-03-02T01:58:02Z) - Context-Aware Safe Reinforcement Learning for Non-Stationary
Environments [24.75527261989899]
Safety is a critical concern when deploying reinforcement learning agents for realistic tasks.
We propose the context-aware safe reinforcement learning (CASRL) method to realize safe adaptation in non-stationary environments.
Results show that the proposed algorithm significantly outperforms existing baselines in terms of safety and robustness.
arXiv Detail & Related papers (2021-01-02T23:52:22Z) - Towards Safe Policy Improvement for Non-Stationary MDPs [48.9966576179679]
Many real-world problems of interest exhibit non-stationarity, and when stakes are high, the cost associated with a false stationarity assumption may be unacceptable.
We take the first steps towards ensuring safety, with high confidence, for smoothly-varying non-stationary decision problems.
Our proposed method extends a type of safe algorithm, called a Seldonian algorithm, through a synthesis of model-free reinforcement learning with time-series analysis.
arXiv Detail & Related papers (2020-10-23T20:13:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.