Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning
- URL: http://arxiv.org/abs/2207.10415v2
- Date: Fri, 2 Jun 2023 22:33:38 GMT
- Title: Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning
- Authors: Ilnura Usmanova, Yarden As, Maryam Kamgarpour, and Andreas Krause
- Abstract summary: We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
- Score: 72.97229770329214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Optimizing noisy functions online, when evaluating the objective requires
experiments on a deployed system, is a crucial task arising in manufacturing,
robotics and many others. Often, constraints on safe inputs are unknown ahead
of time, and we only obtain noisy information, indicating how close we are to
violating the constraints. Yet, safety must be guaranteed at all times, not
only for the final output of the algorithm.
We introduce a general approach for seeking a stationary point in high
dimensional non-linear stochastic optimization problems in which maintaining
safety during learning is crucial. Our approach called LB-SGD is based on
applying stochastic gradient descent (SGD) with a carefully chosen adaptive
step size to a logarithmic barrier approximation of the original problem. We
provide a complete convergence analysis of non-convex, convex, and
strongly-convex smooth constrained problems, with first-order and zeroth-order
feedback. Our approach yields efficient updates and scales better with
dimensionality compared to existing approaches.
We empirically compare the sample complexity and the computational cost of
our method with existing safe learning approaches. Beyond synthetic benchmarks,
we demonstrate the effectiveness of our approach on minimizing constraint
violation in policy search tasks in safe reinforcement learning (RL).
Related papers
- Safe Reinforcement Learning for Constrained Markov Decision Processes with Stochastic Stopping Time [0.6554326244334868]
We present an online reinforcement learning algorithm for constrained Markov decision processes with a safety constraint.
We show that the learned policy is safe with high confidence.
We also demonstrate that efficient exploration can be achieved by defining a subset of the state-space called proxy set.
arXiv Detail & Related papers (2024-03-23T20:22:30Z) - Information-Theoretic Safe Bayesian Optimization [59.758009422067005]
We consider a sequential decision making task, where the goal is to optimize an unknown function without evaluating parameters that violate an unknown (safety) constraint.
Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case.
We propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate.
arXiv Detail & Related papers (2024-02-23T14:31:10Z) - SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization [1.3597551064547502]
This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization.
In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints.
Our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards.
arXiv Detail & Related papers (2023-11-01T22:12:50Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Meta-Learning Priors for Safe Bayesian Optimization [72.8349503901712]
We build on a meta-learning algorithm, F-PACOH, capable of providing reliable uncertainty quantification in settings of data scarcity.
As core contribution, we develop a novel framework for choosing safety-compliant priors in a data-riven manner.
On benchmark functions and a high-precision motion system, we demonstrate that our meta-learned priors accelerate the convergence of safe BO approaches.
arXiv Detail & Related papers (2022-10-03T08:38:38Z) - Safety and Liveness Guarantees through Reach-Avoid Reinforcement
Learning [24.56889192688925]
Reach-avoid optimal control problems are central to safety and liveness assurance for autonomous robotic systems.
Recent successes in reinforcement learning methods to approximately solve optimal control problems with performance objectives make their application to certification problems attractive.
Recent work has shown promise in extending the reinforcement learning machinery to handle safety-type problems, whose objective is not a sum, but a minimum (or maximum) over time.
arXiv Detail & Related papers (2021-12-23T00:44:38Z) - Safe Online Convex Optimization with Unknown Linear Safety Constraints [0.0]
We study the problem of safe online convex optimization, where the action at each time step must satisfy a set of linear safety constraints.
The parameters that specify the linear safety constraints are unknown to the algorithm.
We show that, under the assumption of the availability of a safe baseline action, the SO-PGD algorithm achieves a regret $O(T2/3)$.
arXiv Detail & Related papers (2021-11-14T19:49:19Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - Chance-Constrained Trajectory Optimization for Safe Exploration and
Learning of Nonlinear Systems [81.7983463275447]
Learning-based control algorithms require data collection with abundant supervision for training.
We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained optimal control with dynamics learning and feedback control.
arXiv Detail & Related papers (2020-05-09T05:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.