Related papers: Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise

Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise

URL: http://arxiv.org/abs/2402.07062v1
Date: Sat, 10 Feb 2024 22:38:21 GMT
Title: Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise
Authors: Yuriy Dorn, Aleksandr Katrutsa, Ilgam Latypov, Andrey Pudovikov
Abstract summary: We propose a new algorithm for constructing UCB-type algorithms for multi-armed bandits. We show that in the case of symmetric noise in the reward, we can achieve an $O(log TsqrtKTlog T)$ regret bound instead of $Oleft.
Score: 45.60098988395789
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: In this study, we propose a new method for constructing UCB-type algorithms for stochastic multi-armed bandits based on general convex optimization methods with an inexact oracle. We derive the regret bounds corresponding to the convergence rates of the optimization methods. We propose a new algorithm Clipped-SGD-UCB and show, both theoretically and empirically, that in the case of symmetric noise in the reward, we can achieve an $O(\log T\sqrt{KT\log T})$ regret bound instead of $O\left (T^{\frac{1}{1+\alpha}} K^{\frac{\alpha}{1+\alpha}} \right)$ for the case when the reward distribution satisfies $\mathbb{E}_{X \in D}[|X|^{1+\alpha}] \leq \sigma^{1+\alpha}$ ($\alpha \in (0, 1])$, i.e. perform better than it is assumed by the general lower bound for bandits with heavy-tails. Moreover, the same bound holds even when the reward distribution does not have the expectation, that is, when $\alpha<0$.

Related papers

Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability [49.96531901205305]
We propose the emphfirst algorithm with $tildeO(epsilon-1)$ sample complexity under single-policy concentrability for offline contextual bandits. Our proof leverages the strong convexity of the KL regularization, and the conditional non-negativity of the gap between the true reward and its pessimistic estimator. We extend our algorithm to contextual dueling bandits and achieve a similar nearly optimal sample complexity.
arXiv Detail & Related papers (2025-02-09T22:14:45Z)
Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping [21.865728815935665]
We provide the first convergence under heavy-tailed noises but without clipping. We also establish first $mathcalO(Tfrac1-mathfrakp3mathfrakp-2)$ convergence rate in the case where the tail index $mathfrakp$ is unknown in advance.
arXiv Detail & Related papers (2024-12-27T08:46:46Z)
p-Mean Regret for Stochastic Bandits [52.828710025519996]
We introduce a simple, unified UCB-based algorithm that achieves novel $p$-mean regret bounds. Our framework encompasses both average cumulative regret and Nash regret as special cases.
arXiv Detail & Related papers (2024-12-14T08:38:26Z)
Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed. SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z)
Generalized Regret Analysis of Thompson Sampling using Fractional Posteriors [12.43000662545423]
Thompson sampling (TS) is one of the most popular and earliest algorithms to solve multi-armed bandit problems. We consider a variant of TS, named $alpha$-TS, where we use a fractional or $alpha$-posterior instead of the standard posterior distribution.
arXiv Detail & Related papers (2023-09-12T16:15:33Z)
An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting. Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function. We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z)
Private estimation algorithms for stochastic block models and mixture models [63.07482515700984]
General tools for designing efficient private estimation algorithms. First efficient $(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery.
arXiv Detail & Related papers (2023-01-11T09:12:28Z)
Robust Sparse Mean Estimation via Sum of Squares [42.526664955704746]
We study the problem of high-dimensional sparse mean estimation in the presence of an $epsilon$-fraction of adversarial outliers. Our algorithms follow the Sum-of-Squares based, to algorithms approach.
arXiv Detail & Related papers (2022-06-07T16:49:54Z)
Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits [22.18577284185939]
We develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits. textttHTINF simultaneously achieves the optimal regret for both and adversarial environments. textttAdaTINF is the first algorithm that can adapt to both $alpha$ and $sigma$ to achieve optimal gap-indepedent regret bound.
arXiv Detail & Related papers (2022-01-28T03:53:39Z)
Gaussian Process Bandit Optimization with Few Batches [49.896920704012395]
We introduce a batch algorithm inspired by finite-arm bandit algorithms. We show that it achieves the cumulative regret upper bound $Oast(sqrtTgamma_T)$ using $O(loglog T)$ batches within time horizon $T$. In addition, we propose a modified version of our algorithm, and characterize how the regret is impacted by the number of batches.
arXiv Detail & Related papers (2021-10-15T00:54:04Z)
Smooth Bandit Optimization: Generalization to H\"older Space [37.15553727896912]
We consider bandit optimization of a smooth reward function, where the goal is cumulative regret. Our main result is in generalization of the reward function to H"older space with exponent $alpha>1$. We show that it achieves regret rate that matches the existing lower bound for adaptation within the $alphaleq 1$ subset.
arXiv Detail & Related papers (2020-12-11T01:43:25Z)
Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso bandit is an algorithm that estimates the vector defining the reward function as well as its sparse support. We establish non-asymptotic regret upper bounds scaling as $mathcalO( log d + sqrtT )$ in general, and as $mathcalO( log d + sqrtT )$ under the so-called margin condition.
arXiv Detail & Related papers (2020-10-22T19:14:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.