Fast UCB-type algorithms for stochastic bandits with heavy and super
heavy symmetric noise
- URL: http://arxiv.org/abs/2402.07062v1
- Date: Sat, 10 Feb 2024 22:38:21 GMT
- Title: Fast UCB-type algorithms for stochastic bandits with heavy and super
heavy symmetric noise
- Authors: Yuriy Dorn, Aleksandr Katrutsa, Ilgam Latypov, Andrey Pudovikov
- Abstract summary: We propose a new algorithm for constructing UCB-type algorithms for multi-armed bandits.
We show that in the case of symmetric noise in the reward, we can achieve an $O(log TsqrtKTlog T)$ regret bound instead of $Oleft.
- Score: 45.60098988395789
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this study, we propose a new method for constructing UCB-type algorithms
for stochastic multi-armed bandits based on general convex optimization methods
with an inexact oracle. We derive the regret bounds corresponding to the
convergence rates of the optimization methods. We propose a new algorithm
Clipped-SGD-UCB and show, both theoretically and empirically, that in the case
of symmetric noise in the reward, we can achieve an $O(\log T\sqrt{KT\log T})$
regret bound instead of $O\left (T^{\frac{1}{1+\alpha}}
K^{\frac{\alpha}{1+\alpha}} \right)$ for the case when the reward distribution
satisfies $\mathbb{E}_{X \in D}[|X|^{1+\alpha}] \leq \sigma^{1+\alpha}$
($\alpha \in (0, 1])$, i.e. perform better than it is assumed by the general
lower bound for bandits with heavy-tails. Moreover, the same bound holds even
when the reward distribution does not have the expectation, that is, when
$\alpha<0$.
Related papers
- Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Generalized Regret Analysis of Thompson Sampling using Fractional
Posteriors [12.43000662545423]
Thompson sampling (TS) is one of the most popular and earliest algorithms to solve multi-armed bandit problems.
We consider a variant of TS, named $alpha$-TS, where we use a fractional or $alpha$-posterior instead of the standard posterior distribution.
arXiv Detail & Related papers (2023-09-12T16:15:33Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Private estimation algorithms for stochastic block models and mixture
models [63.07482515700984]
General tools for designing efficient private estimation algorithms.
First efficient $(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery.
arXiv Detail & Related papers (2023-01-11T09:12:28Z) - Robust Sparse Mean Estimation via Sum of Squares [42.526664955704746]
We study the problem of high-dimensional sparse mean estimation in the presence of an $epsilon$-fraction of adversarial outliers.
Our algorithms follow the Sum-of-Squares based, to algorithms approach.
arXiv Detail & Related papers (2022-06-07T16:49:54Z) - Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed
Bandits [22.18577284185939]
We develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits.
textttHTINF simultaneously achieves the optimal regret for both and adversarial environments.
textttAdaTINF is the first algorithm that can adapt to both $alpha$ and $sigma$ to achieve optimal gap-indepedent regret bound.
arXiv Detail & Related papers (2022-01-28T03:53:39Z) - Gaussian Process Bandit Optimization with Few Batches [49.896920704012395]
We introduce a batch algorithm inspired by finite-arm bandit algorithms.
We show that it achieves the cumulative regret upper bound $Oast(sqrtTgamma_T)$ using $O(loglog T)$ batches within time horizon $T$.
In addition, we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches.
arXiv Detail & Related papers (2021-10-15T00:54:04Z) - Smooth Bandit Optimization: Generalization to H\"older Space [37.15553727896912]
We consider bandit optimization of a smooth reward function, where the goal is cumulative regret.
Our main result is in generalization of the reward function to H"older space with exponent $alpha>1$.
We show that it achieves regret rate that matches the existing lower bound for adaptation within the $alphaleq 1$ subset.
arXiv Detail & Related papers (2020-12-11T01:43:25Z) - Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso bandit is an algorithm that estimates the vector defining the reward function as well as its sparse support.
We establish non-asymptotic regret upper bounds scaling as $mathcalO( log d + sqrtT )$ in general, and as $mathcalO( log d + sqrtT )$ under the so-called margin condition.
arXiv Detail & Related papers (2020-10-22T19:14:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.