Delayed Feedback in Kernel Bandits
- URL: http://arxiv.org/abs/2302.00392v1
- Date: Wed, 1 Feb 2023 12:03:19 GMT
- Title: Delayed Feedback in Kernel Bandits
- Authors: Sattar Vakili, Danyal Ahmed, Alberto Bernacchia, Ciara Pike-Burke
- Abstract summary: Black box optimisation of an unknown function is a ubiquitous problem in machine learning, academic research and industrial production.
We consider a kernel bandit problem underally delayed feedback, and propose an algorithm with $tildemathcalO(sqrtGamma_k(T)T+mathbbE[tau]Gamma_k(T))$
This represents a significant improvement over the state of the art regret bound of $tildemathcalO(Gamma_k(T)sqrtT
- Score: 16.11544965325835
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Black box optimisation of an unknown function from expensive and noisy
evaluations is a ubiquitous problem in machine learning, academic research and
industrial production. An abstraction of the problem can be formulated as a
kernel based bandit problem (also known as Bayesian optimisation), where a
learner aims at optimising a kernelized function through sequential noisy
observations. The existing work predominantly assumes feedback is immediately
available; an assumption which fails in many real world situations, including
recommendation systems, clinical trials and hyperparameter tuning. We consider
a kernel bandit problem under stochastically delayed feedback, and propose an
algorithm with $\tilde{\mathcal{O}}(\sqrt{\Gamma_k(T)T}+\mathbb{E}[\tau])$
regret, where $T$ is the number of time steps, $\Gamma_k(T)$ is the maximum
information gain of the kernel with $T$ observations, and $\tau$ is the delay
random variable. This represents a significant improvement over the state of
the art regret bound of
$\tilde{\mathcal{O}}(\Gamma_k(T)\sqrt{T}+\mathbb{E}[\tau]\Gamma_k(T))$ reported
in Verma et al. (2022). In particular, for very non-smooth kernels, the
information gain grows almost linearly in time, trivializing the existing
results. We also validate our theoretical results with simulations.
Related papers
- Gaussian Process Bandit Optimization with Few Batches [49.896920704012395]
We introduce a batch algorithm inspired by finite-arm bandit algorithms.
We show that it achieves the cumulative regret upper bound $Oast(sqrtTgamma_T)$ using $O(loglog T)$ batches within time horizon $T$.
In addition, we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches.
arXiv Detail & Related papers (2021-10-15T00:54:04Z) - Procrastinated Tree Search: Black-box Optimization with Delayed, Noisy,
and Multi-fidelity Feedback [11.064341598231195]
In black-box optimization problems, we aim to maximize an unknown objective function, where the function is only accessible through feedbacks of an evaluation or simulation oracle.
We propose a generic extension of hierarchical optimistic tree search (HOO), called ProCrastinated Tree Search (PCTS)
We provide a generic proof technique to quantify regret of PCTS under delayed, noisy, and multi-fidelity feedbacks.
arXiv Detail & Related papers (2021-10-14T08:55:41Z) - Asynchronous Stochastic Optimization Robust to Arbitrary Delays [54.61797739710608]
We consider optimization with delayed gradients where, at each time stept$, the algorithm makes an update using a stale computation - d_t$ for arbitrary delay $d_t gradient.
Our experiments demonstrate the efficacy and robustness of our algorithm in cases where the delay distribution is skewed or heavy-tailed.
arXiv Detail & Related papers (2021-06-22T15:50:45Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z) - Bayesian Optimistic Optimisation with Exponentially Decaying Regret [58.02542541410322]
The current practical BO algorithms have regret bounds ranging from $mathcalO(fraclogNsqrtN)$ to $mathcal O(e-sqrtN)$, where $N$ is the number of evaluations.
This paper explores the possibility of improving the regret bound in the noiseless setting by intertwining concepts from BO and tree-based optimistic optimisation.
We propose the BOO algorithm, a first practical approach which can achieve an exponential regret bound with order $mathcal O(N-sqrt
arXiv Detail & Related papers (2021-05-10T13:07:44Z) - On Information Gain and Regret Bounds in Gaussian Process Bandits [8.499604625449907]
Consider the optimization of an expensive to evaluate and possibly non- sequential objective function $f$ from noisy feedback observations.
For the Matern family of kernels, lower bounds on $gamma_T$, and regret under the frequentist setting, are known.
arXiv Detail & Related papers (2020-09-15T10:15:29Z) - Time-varying Gaussian Process Bandit Optimization with Non-constant
Evaluation Time [93.6788993843846]
We propose a novel time-varying Bayesian optimization algorithm that can effectively handle the non-constant evaluation time.
Our bound elucidates that a pattern of the evaluation time sequence can hugely affect the difficulty of the problem.
arXiv Detail & Related papers (2020-03-10T13:28:33Z) - Naive Exploration is Optimal for Online LQR [49.681825576239355]
We show that the optimal regret scales as $widetildeTheta(sqrtd_mathbfu2 d_mathbfx T)$, where $T$ is the number of time steps, $d_mathbfu$ is the dimension of the input space, and $d_mathbfx$ is the dimension of the system state.
Our lower bounds rule out the possibility of a $mathrmpoly(logT)$-regret algorithm, which had been
arXiv Detail & Related papers (2020-01-27T03:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.