Related papers: Safe Wasserstein Constrained Deep Q-Learning

Safe Wasserstein Constrained Deep Q-Learning

URL: http://arxiv.org/abs/2002.03016v4
Date: Mon, 25 Oct 2021 20:13:22 GMT
Title: Safe Wasserstein Constrained Deep Q-Learning
Authors: Aaron Kandel, Scott J. Moura
Abstract summary: This paper presents a distributionally robust Q-Learning algorithm (DrQ) which leverages Wasserstein ambiguity sets to provide idealistic probabilistic out-of-sample safety guarantees. Using a case study of lithium-ion battery fast charging, we explore how idealistic safety guarantees translate to generally improved safety.
Score: 2.088376060651494
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents a distributionally robust Q-Learning algorithm (DrQ) which leverages Wasserstein ambiguity sets to provide idealistic probabilistic out-of-sample safety guarantees during online learning. First, we follow past work by separating the constraint functions from the principal objective to create a hierarchy of machines which estimate the feasible state-action space within the constrained Markov decision process (CMDP). DrQ works within this framework by augmenting constraint costs with tightening offset variables obtained through Wasserstein distributionally robust optimization (DRO). These offset variables correspond to worst-case distributions of modeling error characterized by the TD-errors of the constraint Q-functions. This procedure allows us to safely approach the nominal constraint boundaries. Using a case study of lithium-ion battery fast charging, we explore how idealistic safety guarantees translate to generally improved safety relative to conventional methods.

Related papers

Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model [84.00480999255628]
Reinforcement Learning algorithms for safety alignment of Large Language Models (LLMs) encounter the challenge of distribution shift. Current approaches typically address this issue through online sampling from the target policy. We propose a new framework that leverages the model's intrinsic safety judgment capability to extract reward signals.
arXiv Detail & Related papers (2025-03-13T06:40:34Z)
From Uncertain to Safe: Conformal Fine-Tuning of Diffusion Models for Safe PDE Control [16.249515106834355]
We propose Safe Diffusion Models for PDE Control (SafeDiffCon) to achieve optimal control under safety constraints. Our approach post-trains a pre-trained diffusion model to generate control sequences that better satisfy safety constraints. We evaluate SafeDiffCon on three control tasks: 1D Burgers' equation, 2D incompressible fluid, and controlled nuclear fusion problem.
arXiv Detail & Related papers (2025-02-04T10:42:30Z)
Learning Predictive Safety Filter via Decomposition of Robust Invariant Set [6.94348936509225]
This paper presents advantages of both RMPC and RL RL to synthesize safety filters for nonlinear systems. We propose a policy approach for robust reach problems and establish its complexity.
arXiv Detail & Related papers (2023-11-12T08:11:28Z)
SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization [1.3597551064547502]
This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization. In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints. Our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards.
arXiv Detail & Related papers (2023-11-01T22:12:50Z)
Online Constraint Tightening in Stochastic Model Predictive Control: A Regression Approach [49.056933332667114]
No analytical solutions exist for chance-constrained optimal control problems. We propose a data-driven approach for learning the constraint-tightening parameters online during control. Our approach yields constraint-tightening parameters that tightly satisfy the chance constraints.
arXiv Detail & Related papers (2023-10-04T16:22:02Z)
Wasserstein Distributionally Robust Control Barrier Function using Conditional Value-at-Risk with Differentiable Convex Programming [4.825619788907192]
Control Barrier functions (CBFs) have attracted extensive attention for designing safe controllers for real-world safety-critical systems. We present distributional robust CBF to achieve resilience under distributional shift. We also provide an approximate variant of DR-CBF for higher-order systems.
arXiv Detail & Related papers (2023-09-15T18:45:09Z)
Log Barriers for Safe Black-box Optimization with Application to Safe Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial. Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size. We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z)
Robustness Guarantees for Credal Bayesian Networks via Constraint Relaxation over Probabilistic Circuits [16.997060715857987]
We develop a method to quantify the robustness of decision functions with respect to credal Bayesian networks. We show how to obtain a guaranteed upper bound on MARmax in linear time in the size of the circuit.
arXiv Detail & Related papers (2022-05-11T22:37:07Z)
Lyapunov-based uncertainty-aware safe reinforcement learning [0.0]
InReinforcement learning (RL) has shown a promising performance in learning optimal policies for a variety of sequential decision-making tasks. In many real-world RL problems, besides optimizing the main objectives, the agent is expected to satisfy a certain level of safety. We propose a Lyapunov-based uncertainty-aware safe RL model to address these limitations.
arXiv Detail & Related papers (2021-07-29T13:08:15Z)
Pointwise Feasibility of Gaussian Process-based Safety-Critical Control under Model Uncertainty [77.18483084440182]
Control Barrier Functions (CBFs) and Control Lyapunov Functions (CLFs) are popular tools for enforcing safety and stability of a controlled system, respectively. We present a Gaussian Process (GP)-based approach to tackle the problem of model uncertainty in safety-critical controllers that use CBFs and CLFs.
arXiv Detail & Related papers (2021-06-13T23:08:49Z)
Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems. Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs. This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
Robust Reinforcement Learning with Wasserstein Constraint [49.86490922809473]
We show the existence of optimal robust policies, provide a sensitivity analysis for the perturbations, and then design a novel robust learning algorithm. The effectiveness of the proposed algorithm is verified in the Cart-Pole environment.
arXiv Detail & Related papers (2020-06-01T13:48:59Z)
Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function. It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.