Safe Wasserstein Constrained Deep Q-Learning
- URL: http://arxiv.org/abs/2002.03016v4
- Date: Mon, 25 Oct 2021 20:13:22 GMT
- Title: Safe Wasserstein Constrained Deep Q-Learning
- Authors: Aaron Kandel, Scott J. Moura
- Abstract summary: This paper presents a distributionally robust Q-Learning algorithm (DrQ) which leverages Wasserstein ambiguity sets to provide idealistic probabilistic out-of-sample safety guarantees.
Using a case study of lithium-ion battery fast charging, we explore how idealistic safety guarantees translate to generally improved safety.
- Score: 2.088376060651494
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a distributionally robust Q-Learning algorithm (DrQ)
which leverages Wasserstein ambiguity sets to provide idealistic probabilistic
out-of-sample safety guarantees during online learning. First, we follow past
work by separating the constraint functions from the principal objective to
create a hierarchy of machines which estimate the feasible state-action space
within the constrained Markov decision process (CMDP). DrQ works within this
framework by augmenting constraint costs with tightening offset variables
obtained through Wasserstein distributionally robust optimization (DRO). These
offset variables correspond to worst-case distributions of modeling error
characterized by the TD-errors of the constraint Q-functions. This procedure
allows us to safely approach the nominal constraint boundaries.
Using a case study of lithium-ion battery fast charging, we explore how
idealistic safety guarantees translate to generally improved safety relative to
conventional methods.
Related papers
- Learning Predictive Safety Filter via Decomposition of Robust Invariant
Set [6.94348936509225]
This paper presents advantages of both RMPC and RL RL to synthesize safety filters for nonlinear systems.
We propose a policy approach for robust reach problems and establish its complexity.
arXiv Detail & Related papers (2023-11-12T08:11:28Z) - SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization [1.3597551064547502]
This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization.
In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints.
Our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards.
arXiv Detail & Related papers (2023-11-01T22:12:50Z) - Online Constraint Tightening in Stochastic Model Predictive Control: A
Regression Approach [49.056933332667114]
No analytical solutions exist for chance-constrained optimal control problems.
We propose a data-driven approach for learning the constraint-tightening parameters online during control.
Our approach yields constraint-tightening parameters that tightly satisfy the chance constraints.
arXiv Detail & Related papers (2023-10-04T16:22:02Z) - Wasserstein Distributionally Robust Control Barrier Function using
Conditional Value-at-Risk with Differentiable Convex Programming [4.825619788907192]
Control Barrier functions (CBFs) have attracted extensive attention for designing safe controllers for real-world safety-critical systems.
We present distributional robust CBF to achieve resilience under distributional shift.
We also provide an approximate variant of DR-CBF for higher-order systems.
arXiv Detail & Related papers (2023-09-15T18:45:09Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Robustness Guarantees for Credal Bayesian Networks via Constraint
Relaxation over Probabilistic Circuits [16.997060715857987]
We develop a method to quantify the robustness of decision functions with respect to credal Bayesian networks.
We show how to obtain a guaranteed upper bound on MARmax in linear time in the size of the circuit.
arXiv Detail & Related papers (2022-05-11T22:37:07Z) - Lyapunov-based uncertainty-aware safe reinforcement learning [0.0]
InReinforcement learning (RL) has shown a promising performance in learning optimal policies for a variety of sequential decision-making tasks.
In many real-world RL problems, besides optimizing the main objectives, the agent is expected to satisfy a certain level of safety.
We propose a Lyapunov-based uncertainty-aware safe RL model to address these limitations.
arXiv Detail & Related papers (2021-07-29T13:08:15Z) - Pointwise Feasibility of Gaussian Process-based Safety-Critical Control
under Model Uncertainty [77.18483084440182]
Control Barrier Functions (CBFs) and Control Lyapunov Functions (CLFs) are popular tools for enforcing safety and stability of a controlled system, respectively.
We present a Gaussian Process (GP)-based approach to tackle the problem of model uncertainty in safety-critical controllers that use CBFs and CLFs.
arXiv Detail & Related papers (2021-06-13T23:08:49Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - Robust Reinforcement Learning with Wasserstein Constraint [49.86490922809473]
We show the existence of optimal robust policies, provide a sensitivity analysis for the perturbations, and then design a novel robust learning algorithm.
The effectiveness of the proposed algorithm is verified in the Cart-Pole environment.
arXiv Detail & Related papers (2020-06-01T13:48:59Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.