Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed
Bandits
- URL: http://arxiv.org/abs/2201.11921v1
- Date: Fri, 28 Jan 2022 03:53:39 GMT
- Title: Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed
Bandits
- Authors: Jiatai Huang, Yan Dai, Longbo Huang
- Abstract summary: We develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits.
textttHTINF simultaneously achieves the optimal regret for both and adversarial environments.
textttAdaTINF is the first algorithm that can adapt to both $alpha$ and $sigma$ to achieve optimal gap-indepedent regret bound.
- Score: 22.18577284185939
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper, we generalize the concept of heavy-tailed multi-armed bandits
to adversarial environments, and develop robust best-of-both-worlds algorithms
for heavy-tailed multi-armed bandits (MAB), where losses have $\alpha$-th
($1<\alpha\le 2$) moments bounded by $\sigma^\alpha$, while the variances may
not exist. Specifically, we design an algorithm \texttt{HTINF}, when the
heavy-tail parameters $\alpha$ and $\sigma$ are known to the agent,
\texttt{HTINF} simultaneously achieves the optimal regret for both stochastic
and adversarial environments, without knowing the actual environment type
a-priori. When $\alpha,\sigma$ are unknown, \texttt{HTINF} achieves a $\log
T$-style instance-dependent regret in stochastic cases and $o(T)$ no-regret
guarantee in adversarial cases. We further develop an algorithm
\texttt{AdaTINF}, achieving $\mathcal O(\sigma K^{1-\nicefrac
1\alpha}T^{\nicefrac{1}{\alpha}})$ minimax optimal regret even in adversarial
settings, without prior knowledge on $\alpha$ and $\sigma$. This result matches
the known regret lower-bound (Bubeck et al., 2013), which assumed a stochastic
environment and $\alpha$ and $\sigma$ are both known. To our knowledge, the
proposed \texttt{HTINF} algorithm is the first to enjoy a best-of-both-worlds
regret guarantee, and \texttt{AdaTINF} is the first algorithm that can adapt to
both $\alpha$ and $\sigma$ to achieve optimal gap-indepedent regret bound in
classical heavy-tailed stochastic MAB setting and our novel adversarial
formulation.
Related papers
- Fast UCB-type algorithms for stochastic bandits with heavy and super
heavy symmetric noise [45.60098988395789]
We propose a new algorithm for constructing UCB-type algorithms for multi-armed bandits.
We show that in the case of symmetric noise in the reward, we can achieve an $O(log TsqrtKTlog T)$ regret bound instead of $Oleft.
arXiv Detail & Related papers (2024-02-10T22:38:21Z) - Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework.
To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework.
We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z) - Best-of-Both-Worlds Algorithms for Linear Contextual Bandits [11.94312915280916]
We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits.
Our algorithms deliver near-optimal regret bounds in both the adversarial and adversarial regimes.
arXiv Detail & Related papers (2023-12-24T08:27:30Z) - Private estimation algorithms for stochastic block models and mixture
models [63.07482515700984]
General tools for designing efficient private estimation algorithms.
First efficient $(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery.
arXiv Detail & Related papers (2023-01-11T09:12:28Z) - Variance-Aware Sparse Linear Bandits [64.70681598741417]
Worst-case minimax regret for sparse linear bandits is $widetildeThetaleft(sqrtdTright)$.
In the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve an $widetildemathcal O(1)$ regret.
We develop a general framework that converts any variance-aware linear bandit algorithm to a variance-aware algorithm for sparse linear bandits.
arXiv Detail & Related papers (2022-05-26T15:55:44Z) - Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits [45.43968161616453]
We study the optimal batch-regret tradeoff for batch linear contextual bandits.
We prove its regret guarantee, which features a two-phase expression as the time horizon $T$ grows.
We also prove a new matrix inequality concentration with dependence on their dynamic upper bounds.
arXiv Detail & Related papers (2021-10-15T12:32:33Z) - Regret Lower Bound and Optimal Algorithm for High-Dimensional Contextual
Linear Bandit [10.604939762790517]
We prove a minimax lower bound, $mathcalObig((log d)fracalpha+12Tfrac1-alpha2+log Tbig)$, for the cumulative regret.
Second, we propose a simple and computationally efficient algorithm inspired by the general Upper Confidence Bound (UCB) strategy.
arXiv Detail & Related papers (2021-09-23T19:35:38Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z) - Second-Order Information in Non-Convex Stochastic Optimization: Power
and Limitations [54.42518331209581]
We find an algorithm which finds.
epsilon$-approximate stationary point (with $|nabla F(x)|le epsilon$) using.
$(epsilon,gamma)$surimate random random points.
Our lower bounds here are novel even in the noiseless case.
arXiv Detail & Related papers (2020-06-24T04:41:43Z) - Problem-Complexity Adaptive Model Selection for Stochastic Linear
Bandits [20.207989166682832]
We consider the problem of model selection for two popular linear bandit settings.
In the first setting, we consider the $K$ armed mixture bandits, where the mean reward of arm $i in [K]$ is $mu_i+ langle alpha_i,t,theta*|$.
We show that ALB achieves regret scaling of $O(|theta*|sqrtT)$, where $|theta*|$ is apriori unknown
arXiv Detail & Related papers (2020-06-04T02:19:00Z) - Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
We study learning in contextual bandits with the help of loss predictors.
We show that the optimal regret is $mathcalO(minsqrtT, sqrtmathcalETfrac13)$ when $mathcalE$ is known.
arXiv Detail & Related papers (2020-03-04T07:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.