Regret and Belief Complexity Trade-off in Gaussian Process Bandits via
Information Thresholding
- URL: http://arxiv.org/abs/2003.10550v3
- Date: Mon, 21 Mar 2022 21:12:54 GMT
- Title: Regret and Belief Complexity Trade-off in Gaussian Process Bandits via
Information Thresholding
- Authors: Amrit Singh Bedi, Dheeraj Peddireddy, Vaneet Aggarwal, Brian M.
Sadler, and Alec Koppel
- Abstract summary: We show how to characterize the trade-off between regret bounds of GP bandit algorithms and complexity of the posterior distributions.
We observe state of the art accuracy and complexity trade-offs for GP bandit algorithms applied to global optimization.
- Score: 42.669970064867556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bayesian optimization is a framework for global search via maximum a
posteriori updates rather than simulated annealing, and has gained prominence
for decision-making under uncertainty. In this work, we cast Bayesian
optimization as a multi-armed bandit problem, where the payoff function is
sampled from a Gaussian process (GP). Further, we focus on action selections
via upper confidence bound (UCB) or expected improvement (EI) due to their
prevalent use in practice. Prior works using GPs for bandits cannot allow the
iteration horizon $T$ to be large, as the complexity of computing the posterior
parameters scales cubically with the number of past observations. To circumvent
this computational burden, we propose a simple statistical test: only
incorporate an action into the GP posterior when its conditional entropy
exceeds an $\epsilon$ threshold. Doing so permits us to precisely characterize
the trade-off between regret bounds of GP bandit algorithms and complexity of
the posterior distributions depending on the compression parameter $\epsilon$
for both discrete and continuous action sets. To best of our knowledge, this is
the first result which allows us to obtain sublinear regret bounds while still
maintaining sublinear growth rate of the complexity of the posterior which is
linear in the existing literature. Moreover, a provably finite bound on the
complexity could be achieved but the algorithm would result in
$\epsilon$-regret which means $\textbf{Reg}_T/T \rightarrow
\mathcal{O}(\epsilon)$ as $T\rightarrow \infty$. Experimentally, we observe
state of the art accuracy and complexity trade-offs for GP bandit algorithms
applied to global optimization, suggesting the merits of compressed GPs in
bandit settings.
Related papers
- Regret Analysis for Randomized Gaussian Process Upper Confidence Bound [9.967062483758632]
This paper analyzes the randomized variant of GP-UCB called improved randomized GP-UCB.
In both regret analyses, IRGP-UCB achieves a sub-linear regret upper bound without increasing the confidence parameter if the input domain is finite.
arXiv Detail & Related papers (2024-09-02T06:49:29Z) - Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Regret Bounds for Expected Improvement Algorithms in Gaussian Process
Bandit Optimization [63.8557841188626]
The expected improvement (EI) algorithm is one of the most popular strategies for optimization under uncertainty.
We propose a variant of EI with a standard incumbent defined via the GP predictive mean.
We show that our algorithm converges, and achieves a cumulative regret bound of $mathcal O(gamma_TsqrtT)$.
arXiv Detail & Related papers (2022-03-15T13:17:53Z) - Misspecified Gaussian Process Bandit Optimization [59.30399661155574]
Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem.
We introduce a emphmisspecified kernelized bandit setting where the unknown function can be $epsilon$--uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS)
We show that our algorithm achieves optimal dependence on $epsilon$ with no prior knowledge of misspecification.
arXiv Detail & Related papers (2021-11-09T09:00:02Z) - Gaussian Process Bandit Optimization with Few Batches [49.896920704012395]
We introduce a batch algorithm inspired by finite-arm bandit algorithms.
We show that it achieves the cumulative regret upper bound $Oast(sqrtTgamma_T)$ using $O(loglog T)$ batches within time horizon $T$.
In addition, we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches.
arXiv Detail & Related papers (2021-10-15T00:54:04Z) - Optimal Order Simple Regret for Gaussian Process Bandits [6.84224661918159]
We show that a pure exploration algorithm is significantly tighter than the existing bounds.
We show that this regret is optimal up to logarithmically for cases where a lower bound on a kernel is known.
arXiv Detail & Related papers (2021-08-20T16:49:32Z) - Weighted Gaussian Process Bandits for Non-stationary Environments [30.49357960656324]
We develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression.
A key challenge is how to cope with infinite-dimensional feature maps.
We provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains.
arXiv Detail & Related papers (2021-07-06T03:37:33Z) - Lenient Regret and Good-Action Identification in Gaussian Process
Bandits [43.03669155559218]
We study the problem of Gaussian process (GP) bandits under relaxed optimization criteria stating that any function value above a certain threshold is "good enough"
On the practical side, we consider the problem of finding a single "good action" according to a known pre-specified threshold, and introduce several good-action identification algorithms that exploit knowledge of the threshold.
arXiv Detail & Related papers (2021-02-11T01:16:58Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.