Smooth Bandit Optimization: Generalization to H\"older Space
- URL: http://arxiv.org/abs/2012.06076v1
- Date: Fri, 11 Dec 2020 01:43:25 GMT
- Title: Smooth Bandit Optimization: Generalization to H\"older Space
- Authors: Yusha Liu, Yining Wang, Aarti Singh
- Abstract summary: We consider bandit optimization of a smooth reward function, where the goal is cumulative regret.
Our main result is in generalization of the reward function to H"older space with exponent $alpha>1$.
We show that it achieves regret rate that matches the existing lower bound for adaptation within the $alphaleq 1$ subset.
- Score: 37.15553727896912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider bandit optimization of a smooth reward function, where the goal
is cumulative regret minimization. This problem has been studied for
$\alpha$-H\"older continuous (including Lipschitz) functions with $0<\alpha\leq
1$. Our main result is in generalization of the reward function to H\"older
space with exponent $\alpha>1$ to bridge the gap between Lipschitz bandits and
infinitely-differentiable models such as linear bandits. For H\"older
continuous functions, approaches based on random sampling in bins of a
discretized domain suffices as optimal. In contrast, we propose a class of
two-layer algorithms that deploy misspecified linear/polynomial bandit
algorithms in bins. We demonstrate that the proposed algorithm can exploit
higher-order smoothness of the function by deriving a regret upper bound of
$\tilde{O}(T^\frac{d+\alpha}{d+2\alpha})$ for when $\alpha>1$, which matches
existing lower bound. We also study adaptation to unknown function smoothness
over a continuous scale of H\"older spaces indexed by $\alpha$, with a bandit
model selection approach applied with our proposed two-layer algorithms. We
show that it achieves regret rate that matches the existing lower bound for
adaptation within the $\alpha\leq 1$ subset.
Related papers
- Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Fast UCB-type algorithms for stochastic bandits with heavy and super
heavy symmetric noise [45.60098988395789]
We propose a new algorithm for constructing UCB-type algorithms for multi-armed bandits.
We show that in the case of symmetric noise in the reward, we can achieve an $O(log TsqrtKTlog T)$ regret bound instead of $Oleft.
arXiv Detail & Related papers (2024-02-10T22:38:21Z) - Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework.
To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework.
We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z) - Gradient-free optimization of highly smooth functions: improved analysis
and a new algorithm [87.22224691317766]
This work studies problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth.
We consider two kinds of zero-order projected gradient descent algorithms.
arXiv Detail & Related papers (2023-06-03T17:05:13Z) - On the Interplay Between Misspecification and Sub-optimality Gap in
Linear Contextual Bandits [76.2262680277608]
We study linear contextual bandits in the misspecified setting, where the expected reward function can be approximated by a linear function class.
We show that our algorithm enjoys the same gap-dependent regret bound $tilde O (d2/Delta)$ as in the well-specified setting up to logarithmic factors.
arXiv Detail & Related papers (2023-03-16T15:24:29Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Nearly Optimal Algorithms for Level Set Estimation [21.83736847203543]
We provide a new approach to the level set estimation problem by relating it to recent adaptive experimental design methods for linear bandits.
We show that our bounds are nearly optimal, namely, our upper bounds match existing lower bounds for threshold linear bandits.
arXiv Detail & Related papers (2021-11-02T17:45:02Z) - Gaussian Process Bandit Optimization with Few Batches [49.896920704012395]
We introduce a batch algorithm inspired by finite-arm bandit algorithms.
We show that it achieves the cumulative regret upper bound $Oast(sqrtTgamma_T)$ using $O(loglog T)$ batches within time horizon $T$.
In addition, we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches.
arXiv Detail & Related papers (2021-10-15T00:54:04Z) - Regret Lower Bound and Optimal Algorithm for High-Dimensional Contextual
Linear Bandit [10.604939762790517]
We prove a minimax lower bound, $mathcalObig((log d)fracalpha+12Tfrac1-alpha2+log Tbig)$, for the cumulative regret.
Second, we propose a simple and computationally efficient algorithm inspired by the general Upper Confidence Bound (UCB) strategy.
arXiv Detail & Related papers (2021-09-23T19:35:38Z) - Low-Rank Generalized Linear Bandit Problems [19.052901817304743]
In a low-rank linear bandit problem, the reward of an action is the inner product between the action and an unknown low-rank matrix $Theta*$.
We propose an algorithm based on a novel combination of online-to-confidence-set conversioncitepabbasi2012online and the exponentially weighted average forecaster constructed by a covering of low-rank matrices.
arXiv Detail & Related papers (2020-06-04T15:30:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.