Related papers: Regret and Belief Complexity Trade-off in Gaussian Process Bandits via Information Thresholding

Regret and Belief Complexity Trade-off in Gaussian Process Bandits via Information Thresholding

URL: http://arxiv.org/abs/2003.10550v3
Date: Mon, 21 Mar 2022 21:12:54 GMT
Title: Regret and Belief Complexity Trade-off in Gaussian Process Bandits via Information Thresholding
Authors: Amrit Singh Bedi, Dheeraj Peddireddy, Vaneet Aggarwal, Brian M. Sadler, and Alec Koppel
Abstract summary: We show how to characterize the trade-off between regret bounds of GP bandit algorithms and complexity of the posterior distributions. We observe state of the art accuracy and complexity trade-offs for GP bandit algorithms applied to global optimization.
Score: 42.669970064867556
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bayesian optimization is a framework for global search via maximum a posteriori updates rather than simulated annealing, and has gained prominence for decision-making under uncertainty. In this work, we cast Bayesian optimization as a multi-armed bandit problem, where the payoff function is sampled from a Gaussian process (GP). Further, we focus on action selections via upper confidence bound (UCB) or expected improvement (EI) due to their prevalent use in practice. Prior works using GPs for bandits cannot allow the iteration horizon $T$ to be large, as the complexity of computing the posterior parameters scales cubically with the number of past observations. To circumvent this computational burden, we propose a simple statistical test: only incorporate an action into the GP posterior when its conditional entropy exceeds an $\epsilon$ threshold. Doing so permits us to precisely characterize the trade-off between regret bounds of GP bandit algorithms and complexity of the posterior distributions depending on the compression parameter $\epsilon$ for both discrete and continuous action sets. To best of our knowledge, this is the first result which allows us to obtain sublinear regret bounds while still maintaining sublinear growth rate of the complexity of the posterior which is linear in the existing literature. Moreover, a provably finite bound on the complexity could be achieved but the algorithm would result in $\epsilon$-regret which means $\textbf{Reg}_T/T \rightarrow \mathcal{O}(\epsilon)$ as $T\rightarrow \infty$. Experimentally, we observe state of the art accuracy and complexity trade-offs for GP bandit algorithms applied to global optimization, suggesting the merits of compressed GPs in bandit settings.

Related papers

Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability [49.96531901205305]
We propose the emphfirst algorithm with $tildeO(epsilon-1)$ sample complexity under single-policy concentrability for offline contextual bandits. Our proof leverages the strong convexity of the KL regularization, and the conditional non-negativity of the gap between the true reward and its pessimistic estimator. We extend our algorithm to contextual dueling bandits and achieve a similar nearly optimal sample complexity.
arXiv Detail & Related papers (2025-02-09T22:14:45Z)
Regret Analysis for Randomized Gaussian Process Upper Confidence Bound [9.967062483758632]
This paper analyzes the randomized variant of GP-UCB called improved randomized GP-UCB. In both regret analyses, IRGP-UCB achieves a sub-linear regret upper bound without increasing the confidence parameter if the input domain is finite.
arXiv Detail & Related papers (2024-09-02T06:49:29Z)
Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds. We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$. Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z)
Regret Bounds for Expected Improvement Algorithms in Gaussian Process Bandit Optimization [63.8557841188626]
The expected improvement (EI) algorithm is one of the most popular strategies for optimization under uncertainty. We propose a variant of EI with a standard incumbent defined via the GP predictive mean. We show that our algorithm converges, and achieves a cumulative regret bound of $mathcal O(gamma_TsqrtT)$.
arXiv Detail & Related papers (2022-03-15T13:17:53Z)
Misspecified Gaussian Process Bandit Optimization [59.30399661155574]
Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem. We introduce a emphmisspecified kernelized bandit setting where the unknown function can be $epsilon$--uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS) We show that our algorithm achieves optimal dependence on $epsilon$ with no prior knowledge of misspecification.
arXiv Detail & Related papers (2021-11-09T09:00:02Z)
Gaussian Process Bandit Optimization with Few Batches [49.896920704012395]
We introduce a batch algorithm inspired by finite-arm bandit algorithms. We show that it achieves the cumulative regret upper bound $Oast(sqrtTgamma_T)$ using $O(loglog T)$ batches within time horizon $T$. In addition, we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches.
arXiv Detail & Related papers (2021-10-15T00:54:04Z)
Optimal Order Simple Regret for Gaussian Process Bandits [6.84224661918159]
We show that a pure exploration algorithm is significantly tighter than the existing bounds. We show that this regret is optimal up to logarithmically for cases where a lower bound on a kernel is known.
arXiv Detail & Related papers (2021-08-20T16:49:32Z)
Weighted Gaussian Process Bandits for Non-stationary Environments [30.49357960656324]
We develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression. A key challenge is how to cope with infinite-dimensional feature maps. We provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains.
arXiv Detail & Related papers (2021-07-06T03:37:33Z)
Lenient Regret and Good-Action Identification in Gaussian Process Bandits [43.03669155559218]
We study the problem of Gaussian process (GP) bandits under relaxed optimization criteria stating that any function value above a certain threshold is "good enough" On the practical side, we consider the problem of finding a single "good action" according to a known pre-specified threshold, and introduce several good-action identification algorithms that exploit knowledge of the threshold.
arXiv Detail & Related papers (2021-02-11T01:16:58Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.