Regret Bounds for Safe Gaussian Process Bandit Optimization
- URL: http://arxiv.org/abs/2005.01936v1
- Date: Tue, 5 May 2020 03:54:43 GMT
- Title: Regret Bounds for Safe Gaussian Process Bandit Optimization
- Authors: Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis
- Abstract summary: In safety-critical systems, it is paramount that the learner's actions do not violate the safety constraints at any stage of the learning process.
We develop a safe variant of GP-UCB called SGP-UCB, with necessary modifications to respect safety constraints at every round.
- Score: 42.336882999112845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many applications require a learner to make sequential decisions given
uncertainty regarding both the system's payoff function and safety constraints.
In safety-critical systems, it is paramount that the learner's actions do not
violate the safety constraints at any stage of the learning process. In this
paper, we study a stochastic bandit optimization problem where the unknown
payoff and constraint functions are sampled from Gaussian Processes (GPs) first
considered in [Srinivas et al., 2010]. We develop a safe variant of GP-UCB
called SGP-UCB, with necessary modifications to respect safety constraints at
every round. The algorithm has two distinct phases. The first phase seeks to
estimate the set of safe actions in the decision set, while the second phase
follows the GP-UCB decision rule. Our main contribution is to derive the first
sub-linear regret bounds for this problem. We numerically compare SGP-UCB
against existing safe Bayesian GP optimization algorithms.
Related papers
- On Safety in Safe Bayesian Optimization [5.9045432488022485]
We investigate three safety-related issues of the popular class of SafeOpt-type algorithms.
First, these algorithms critically rely on frequentist bounds uncertainty for Gaussian Process (GP) regression.
Second, we identify assuming an upper bound on the reproducing kernel Hilbert space (RKHS) norm of the target function.
Third, SafeOpt and derived algorithms rely on a discrete search space, making them difficult to apply to higher-dimensional problems.
arXiv Detail & Related papers (2024-03-19T17:50:32Z) - Truly No-Regret Learning in Constrained MDPs [61.78619476991494]
We propose a model-based primal-dual algorithm to learn in an unknown CMDP.
We prove that our algorithm achieves sublinear regret without error cancellations.
arXiv Detail & Related papers (2024-02-24T09:47:46Z) - SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization [1.3597551064547502]
This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization.
In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints.
Our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards.
arXiv Detail & Related papers (2023-11-01T22:12:50Z) - Safe Exploration in Reinforcement Learning: A Generalized Formulation
and Algorithms [8.789204441461678]
We present a solution of the safe exploration (GSE) problem in the form of a meta-algorithm for safe exploration, MASE.
Our proposed algorithm achieves better performance than state-of-the-art algorithms on grid-world and Safety Gym benchmarks.
arXiv Detail & Related papers (2023-10-05T00:47:09Z) - On the Sublinear Regret of GP-UCB [58.25014663727544]
We show that the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm enjoys nearly optimal regret rates.
Our improvements rely on a key technical contribution -- regularizing kernel ridge estimators in proportion to the smoothness of the underlying kernel.
arXiv Detail & Related papers (2023-07-14T13:56:11Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Stochastic Conservative Contextual Linear Bandits [8.684768561839146]
We study the problem of safe real-time decision making under uncertainty.
We formulate a conservative contextual bandit formulation for real-time decision making.
arXiv Detail & Related papers (2022-03-29T14:50:50Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - Safe Online Convex Optimization with Unknown Linear Safety Constraints [0.0]
We study the problem of safe online convex optimization, where the action at each time step must satisfy a set of linear safety constraints.
The parameters that specify the linear safety constraints are unknown to the algorithm.
We show that, under the assumption of the availability of a safe baseline action, the SO-PGD algorithm achieves a regret $O(T2/3)$.
arXiv Detail & Related papers (2021-11-14T19:49:19Z) - Bayesian decision-making under misspecified priors with applications to
meta-learning [64.38020203019013]
Thompson sampling and other sequential decision-making algorithms are popular approaches to tackle explore/exploit trade-offs in contextual bandits.
We show that performance degrades gracefully with misspecified priors.
arXiv Detail & Related papers (2021-07-03T23:17:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.