On Safety in Safe Bayesian Optimization
- URL: http://arxiv.org/abs/2403.12948v1
- Date: Tue, 19 Mar 2024 17:50:32 GMT
- Title: On Safety in Safe Bayesian Optimization
- Authors: Christian Fiedler, Johanna Menn, Lukas Kreisköther, Sebastian Trimpe,
- Abstract summary: We investigate three safety-related issues of the popular class of SafeOpt-type algorithms.
First, these algorithms critically rely on frequentist bounds uncertainty for Gaussian Process (GP) regression.
Second, we identify assuming an upper bound on the reproducing kernel Hilbert space (RKHS) norm of the target function.
Third, SafeOpt and derived algorithms rely on a discrete search space, making them difficult to apply to higher-dimensional problems.
- Score: 5.9045432488022485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optimizing an unknown function under safety constraints is a central task in robotics, biomedical engineering, and many other disciplines, and increasingly safe Bayesian Optimization (BO) is used for this. Due to the safety critical nature of these applications, it is of utmost importance that theoretical safety guarantees for these algorithms translate into the real world. In this work, we investigate three safety-related issues of the popular class of SafeOpt-type algorithms. First, these algorithms critically rely on frequentist uncertainty bounds for Gaussian Process (GP) regression, but concrete implementations typically utilize heuristics that invalidate all safety guarantees. We provide a detailed analysis of this problem and introduce Real-\b{eta}-SafeOpt, a variant of the SafeOpt algorithm that leverages recent GP bounds and thus retains all theoretical guarantees. Second, we identify assuming an upper bound on the reproducing kernel Hilbert space (RKHS) norm of the target function, a key technical assumption in SafeOpt-like algorithms, as a central obstacle to real-world usage. To overcome this challenge, we introduce the Lipschitz-only Safe Bayesian Optimization (LoSBO) algorithm, which guarantees safety without an assumption on the RKHS bound, and empirically show that this algorithm is not only safe, but also exhibits superior performance compared to the state-of-the-art on several function classes. Third, SafeOpt and derived algorithms rely on a discrete search space, making them difficult to apply to higher-dimensional problems. To widen the applicability of these algorithms, we introduce Lipschitz-only GP-UCB (LoS-GP-UCB), a variant of LoSBO applicable to moderately high-dimensional problems, while retaining safety.
Related papers
- PACSBO: Probably approximately correct safe Bayesian optimization [10.487548576958421]
We propose an algorithm that estimates an upper bound on the RKHS norm of an unknown function from data.
We treat the RKHS norm as a local rather than a global object, and thus reduce conservatism.
Integrating the RKHS norm estimation and the local interpretation of the RKHS norm into a safe BO algorithm yields PACSBO.
arXiv Detail & Related papers (2024-09-02T10:50:34Z) - Information-Theoretic Safe Bayesian Optimization [59.758009422067005]
We consider a sequential decision making task, where the goal is to optimize an unknown function without evaluating parameters that violate an unknown (safety) constraint.
Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case.
We propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate.
arXiv Detail & Related papers (2024-02-23T14:31:10Z) - SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization [1.3597551064547502]
This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization.
In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints.
Our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards.
arXiv Detail & Related papers (2023-11-01T22:12:50Z) - Safe Exploration in Reinforcement Learning: A Generalized Formulation
and Algorithms [8.789204441461678]
We present a solution of the safe exploration (GSE) problem in the form of a meta-algorithm for safe exploration, MASE.
Our proposed algorithm achieves better performance than state-of-the-art algorithms on grid-world and Safety Gym benchmarks.
arXiv Detail & Related papers (2023-10-05T00:47:09Z) - Approximate Model-Based Shielding for Safe Reinforcement Learning [83.55437924143615]
We propose a principled look-ahead shielding algorithm for verifying the performance of learned RL policies.
Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system.
We demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
arXiv Detail & Related papers (2023-07-27T15:19:45Z) - Evaluating Model-free Reinforcement Learning toward Safety-critical
Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL.
We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection.
To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z) - Benefits of Monotonicity in Safe Exploration with Gaussian Processes [50.71125084216603]
We consider the problem of sequentially maximising an unknown function over a set of actions.
We show that textscsffamily M-SafeUCB enjoys theoretical guarantees in terms of safety, a suitably-defined regret notion, and approximately finding the entire safe boundary.
arXiv Detail & Related papers (2022-11-03T02:52:30Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - Safe Policy Optimization with Local Generalized Linear Function
Approximations [17.84511819022308]
Existing safe exploration methods guaranteed safety under the assumption of regularity.
We propose a novel algorithm, SPO-LF, that optimize an agent's policy while learning the relation between a locally available feature obtained by sensors and environmental reward/safety.
We experimentally show that our algorithm is 1) more efficient in terms of sample complexity and computational cost and 2) more applicable to large-scale problems than previous safe RL methods with theoretical guarantees.
arXiv Detail & Related papers (2021-11-09T00:47:50Z) - Regret Bounds for Safe Gaussian Process Bandit Optimization [42.336882999112845]
In safety-critical systems, it is paramount that the learner's actions do not violate the safety constraints at any stage of the learning process.
We develop a safe variant of GP-UCB called SGP-UCB, with necessary modifications to respect safety constraints at every round.
arXiv Detail & Related papers (2020-05-05T03:54:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.