Lenient Regret and Good-Action Identification in Gaussian Process
Bandits
- URL: http://arxiv.org/abs/2102.05793v1
- Date: Thu, 11 Feb 2021 01:16:58 GMT
- Title: Lenient Regret and Good-Action Identification in Gaussian Process
Bandits
- Authors: Xu Cai, Selwyn Gomes, Jonathan Scarlett
- Abstract summary: We study the problem of Gaussian process (GP) bandits under relaxed optimization criteria stating that any function value above a certain threshold is "good enough"
On the practical side, we consider the problem of finding a single "good action" according to a known pre-specified threshold, and introduce several good-action identification algorithms that exploit knowledge of the threshold.
- Score: 43.03669155559218
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the problem of Gaussian process (GP) bandits under
relaxed optimization criteria stating that any function value above a certain
threshold is "good enough". On the theoretical side, we study various
\emph{\lenient regret} notions in which all near-optimal actions incur zero
penalty, and provide upper bounds on the lenient regret for GP-UCB and an
elimination algorithm, circumventing the usual $O(\sqrt{T})$ term (with time
horizon $T$) resulting from zooming extremely close towards the function
maximum. In addition, we complement these upper bounds with
algorithm-independent lower bounds. On the practical side, we consider the
problem of finding a single "good action" according to a known pre-specified
threshold, and introduce several good-action identification algorithms that
exploit knowledge of the threshold. We experimentally find that such algorithms
can often find a good action faster than standard optimization-based
approaches.
Related papers
- Near-Optimal Algorithm for Non-Stationary Kernelized Bandits [6.379833644595456]
We study a non-stationary kernelized bandit (KB) problem, also called time-varying Bayesian optimization.
We show the first algorithm-independent regret lower bound for non-stationary KB with squared exponential and Mat'ern kernels.
We propose a novel near-optimal algorithm called restarting phased elimination with random permutation.
arXiv Detail & Related papers (2024-10-21T14:28:26Z) - Regret Bounds for Expected Improvement Algorithms in Gaussian Process
Bandit Optimization [63.8557841188626]
The expected improvement (EI) algorithm is one of the most popular strategies for optimization under uncertainty.
We propose a variant of EI with a standard incumbent defined via the GP predictive mean.
We show that our algorithm converges, and achieves a cumulative regret bound of $mathcal O(gamma_TsqrtT)$.
arXiv Detail & Related papers (2022-03-15T13:17:53Z) - Misspecified Gaussian Process Bandit Optimization [59.30399661155574]
Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem.
We introduce a emphmisspecified kernelized bandit setting where the unknown function can be $epsilon$--uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS)
We show that our algorithm achieves optimal dependence on $epsilon$ with no prior knowledge of misspecification.
arXiv Detail & Related papers (2021-11-09T09:00:02Z) - Optimal Order Simple Regret for Gaussian Process Bandits [6.84224661918159]
We show that a pure exploration algorithm is significantly tighter than the existing bounds.
We show that this regret is optimal up to logarithmically for cases where a lower bound on a kernel is known.
arXiv Detail & Related papers (2021-08-20T16:49:32Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Regret and Belief Complexity Trade-off in Gaussian Process Bandits via
Information Thresholding [42.669970064867556]
We show how to characterize the trade-off between regret bounds of GP bandit algorithms and complexity of the posterior distributions.
We observe state of the art accuracy and complexity trade-offs for GP bandit algorithms applied to global optimization.
arXiv Detail & Related papers (2020-03-23T21:05:15Z) - Corruption-Tolerant Gaussian Process Bandit Optimization [130.60115798580136]
We consider the problem of optimizing an unknown (typically non-producing) function with a bounded norm.
We introduce an algorithm based on Fast-Slow GP-UCB based on "fast but non-robust" and "slow"
We argue that certain dependencies cannot be required depending on the corruption level.
arXiv Detail & Related papers (2020-03-04T09:46:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.