Related papers: On Safety in Safe Bayesian Optimization

On Safety in Safe Bayesian Optimization

URL: http://arxiv.org/abs/2403.12948v1
Date: Tue, 19 Mar 2024 17:50:32 GMT
Title: On Safety in Safe Bayesian Optimization
Authors: Christian Fiedler, Johanna Menn, Lukas Kreisköther, Sebastian Trimpe,
Abstract summary: We investigate three safety-related issues of the popular class of SafeOpt-type algorithms. First, these algorithms critically rely on frequentist bounds uncertainty for Gaussian Process (GP) regression. Second, we identify assuming an upper bound on the reproducing kernel Hilbert space (RKHS) norm of the target function. Third, SafeOpt and derived algorithms rely on a discrete search space, making them difficult to apply to higher-dimensional problems.
Score: 5.9045432488022485
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Optimizing an unknown function under safety constraints is a central task in robotics, biomedical engineering, and many other disciplines, and increasingly safe Bayesian Optimization (BO) is used for this. Due to the safety critical nature of these applications, it is of utmost importance that theoretical safety guarantees for these algorithms translate into the real world. In this work, we investigate three safety-related issues of the popular class of SafeOpt-type algorithms. First, these algorithms critically rely on frequentist uncertainty bounds for Gaussian Process (GP) regression, but concrete implementations typically utilize heuristics that invalidate all safety guarantees. We provide a detailed analysis of this problem and introduce Real-\b{eta}-SafeOpt, a variant of the SafeOpt algorithm that leverages recent GP bounds and thus retains all theoretical guarantees. Second, we identify assuming an upper bound on the reproducing kernel Hilbert space (RKHS) norm of the target function, a key technical assumption in SafeOpt-like algorithms, as a central obstacle to real-world usage. To overcome this challenge, we introduce the Lipschitz-only Safe Bayesian Optimization (LoSBO) algorithm, which guarantees safety without an assumption on the RKHS bound, and empirically show that this algorithm is not only safe, but also exhibits superior performance compared to the state-of-the-art on several function classes. Third, SafeOpt and derived algorithms rely on a discrete search space, making them difficult to apply to higher-dimensional problems. To widen the applicability of these algorithms, we introduce Lipschitz-only GP-UCB (LoS-GP-UCB), a variant of LoSBO applicable to moderately high-dimensional problems, while retaining safety.

Related papers

Safety in safe Bayesian optimization and its ramifications for control [6.450289319821615]
In control engineering, parameters of a pre-designed controller are often tuned online in feedback with a plant. In particular, machine learning methods have been deployed for this important problem, in particular, Bayesian optimization (BO) We identify two significant obstacles to practical safety. First, SafeOpt-type algorithms rely on quantitative uncertainty bounds, and most implementations replace these by theoretically unsupporteds. We propose Lipschitz-only Safe Bayesian Optimization (LoSBO), a safe BO algorithm that relies only on a known Lipschitz bound for its safety.
arXiv Detail & Related papers (2025-01-23T14:24:11Z)
PACSBO: Probably approximately correct safe Bayesian optimization [10.487548576958421]
We propose an algorithm that estimates an upper bound on the RKHS norm of an unknown function from data. We treat the RKHS norm as a local rather than a global object, and thus reduce conservatism. Integrating the RKHS norm estimation and the local interpretation of the RKHS norm into a safe BO algorithm yields PACSBO.
arXiv Detail & Related papers (2024-09-02T10:50:34Z)
Information-Theoretic Safe Bayesian Optimization [59.758009422067005]
We consider a sequential decision making task, where the goal is to optimize an unknown function without evaluating parameters that violate an unknown (safety) constraint. Most current methods rely on a discretization of the domain and cannot be directly extended to the continuous case. We propose an information-theoretic safe exploration criterion that directly exploits the GP posterior to identify the most informative safe parameters to evaluate.
arXiv Detail & Related papers (2024-02-23T14:31:10Z)
SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization [1.3597551064547502]
This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization. In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints. Our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards.
arXiv Detail & Related papers (2023-11-01T22:12:50Z)
Safe Exploration in Reinforcement Learning: A Generalized Formulation and Algorithms [8.789204441461678]
We present a solution of the safe exploration (GSE) problem in the form of a meta-algorithm for safe exploration, MASE. Our proposed algorithm achieves better performance than state-of-the-art algorithms on grid-world and Safety Gym benchmarks.
arXiv Detail & Related papers (2023-10-05T00:47:09Z)
Approximate Model-Based Shielding for Safe Reinforcement Learning [83.55437924143615]
We propose a principled look-ahead shielding algorithm for verifying the performance of learned RL policies. Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system. We demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
arXiv Detail & Related papers (2023-07-27T15:19:45Z)
Evaluating Model-free Reinforcement Learning toward Safety-critical Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL. We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection. To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z)
Benefits of Monotonicity in Safe Exploration with Gaussian Processes [50.71125084216603]
We consider the problem of sequentially maximising an unknown function over a set of actions. We show that textscsffamily M-SafeUCB enjoys theoretical guarantees in terms of safety, a suitably-defined regret notion, and approximately finding the entire safe boundary.
arXiv Detail & Related papers (2022-11-03T02:52:30Z)
Log Barriers for Safe Black-box Optimization with Application to Safe Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial. Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size. We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z)
Safe Online Bid Optimization with Return-On-Investment and Budget Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems. We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations. More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z)
Safe Policy Optimization with Local Generalized Linear Function Approximations [17.84511819022308]
Existing safe exploration methods guaranteed safety under the assumption of regularity. We propose a novel algorithm, SPO-LF, that optimize an agent's policy while learning the relation between a locally available feature obtained by sensors and environmental reward/safety. We experimentally show that our algorithm is 1) more efficient in terms of sample complexity and computational cost and 2) more applicable to large-scale problems than previous safe RL methods with theoretical guarantees.
arXiv Detail & Related papers (2021-11-09T00:47:50Z)
Regret Bounds for Safe Gaussian Process Bandit Optimization [42.336882999112845]
In safety-critical systems, it is paramount that the learner's actions do not violate the safety constraints at any stage of the learning process. We develop a safe variant of GP-UCB called SGP-UCB, with necessary modifications to respect safety constraints at every round.
arXiv Detail & Related papers (2020-05-05T03:54:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.