Improved Analysis of Robustness of the Tsallis-INF Algorithm to
Adversarial Corruptions in Stochastic Multiarmed Bandits
- URL: http://arxiv.org/abs/2103.12487v1
- Date: Tue, 23 Mar 2021 12:26:39 GMT
- Title: Improved Analysis of Robustness of the Tsallis-INF Algorithm to
Adversarial Corruptions in Stochastic Multiarmed Bandits
- Authors: Saeed Masoudian, Yevgeny Seldin
- Abstract summary: We derive improved regret bounds for the Tsallis-INF algorithm of Zimmert and Seldin (2021).
In particular, for $C = Thetaleft(fracTlog Tlog T$)$, where $T$ is the time horizon, we achieve an improvement by a multiplicative factor.
We also improve the dependence of the regret bound on time horizon from $log T$ to $log frac(K-1)T(sum_ineq i*frac1Delta_
- Score: 12.462608802359936
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We derive improved regret bounds for the Tsallis-INF algorithm of Zimmert and
Seldin (2021). In the adversarial regime with a self-bounding constraint and
the stochastic regime with adversarial corruptions as its special case we
improve the dependence on corruption magnitude $C$. In particular, for $C =
\Theta\left(\frac{T}{\log T}\right)$, where $T$ is the time horizon, we achieve
an improvement by a multiplicative factor of $\sqrt{\frac{\log T}{\log\log T}}$
relative to the bound of Zimmert and Seldin (2021). We also improve the
dependence of the regret bound on time horizon from $\log T$ to $\log
\frac{(K-1)T}{(\sum_{i\neq i^*}\frac{1}{\Delta_i})^2}$, where $K$ is the number
of arms, $\Delta_i$ are suboptimality gaps for suboptimal arms $i$, and $i^*$
is the optimal arm. Additionally, we provide a general analysis, which allows
to achieve the same kind of improvement for generalizations of Tsallis-INF to
other settings beyond multiarmed bandits.
Related papers
- LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits [38.41164102066483]
This study considers the linear contextual bandit problem with independent and identically distributed contexts.
Our proposed algorithm is based on the Follow-The-Regularized-Leader with the Tsallis entropy and referred to as the $alpha$-textual-Con (LC)-Tsallis-INF.
arXiv Detail & Related papers (2024-03-05T18:59:47Z) - Best-of-Both-Worlds Linear Contextual Bandits [45.378265414553226]
This study investigates the problem of $K$-armed linear contextual bandits, an instance of the multi-armed bandit problem, under an adversarial corruption.
We develop a strategy that is effective in both adversarial environments, with theoretical guarantees.
We refer to our strategy as the Best-of-Both-Worlds (BoBW) RealFTRL, due to its theoretical guarantees in both adversarial regimes.
arXiv Detail & Related papers (2023-12-27T09:32:18Z) - Best-of-Both-Worlds Algorithms for Linear Contextual Bandits [11.94312915280916]
We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits.
Our algorithms deliver near-optimal regret bounds in both the adversarial and adversarial regimes.
arXiv Detail & Related papers (2023-12-24T08:27:30Z) - Corruption-Robust Offline Reinforcement Learning with General Function
Approximation [60.91257031278004]
We investigate the problem of corruption in offline reinforcement learning (RL) with general function approximation.
Our goal is to find a policy that is robust to such corruption and minimizes the suboptimality gap with respect to the optimal policy for the uncorrupted Markov decision processes (MDPs)
arXiv Detail & Related papers (2023-10-23T04:07:26Z) - Tight Bounds for $\gamma$-Regret via the Decision-Estimation Coefficient [88.86699022151598]
We give a statistical characterization of the $gamma$-regret for arbitrary structured bandit problems.
The $gamma$-regret emerges in structured bandit problems over a function class $mathcalF$ where finding an exact optimum of $f in mathcalF$ is intractable.
arXiv Detail & Related papers (2023-03-06T17:54:33Z) - A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback [25.68113242132723]
We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial multiarmed bandits with delayed feedback.
We simultaneously achieves a near-optimal adversarial regret guarantee in the setting with fixed delays.
We also present an extension of the algorithm to the case of arbitrary delays.
arXiv Detail & Related papers (2022-06-29T20:49:45Z) - Corralling a Larger Band of Bandits: A Case Study on Switching Regret
for Linear Bandits [99.86860277006318]
We consider the problem of combining and learning over a set of adversarial algorithms with the goal of adaptively tracking the best one on the fly.
The CORRAL of Agarwal et al. achieves this goal with a regret overhead of order $widetildeO(sqrtd S T)$ where $M$ is the number of base algorithms and $T$ is the time horizon.
Motivated by this issue, we propose a new recipe to corral a larger band of bandit algorithms whose regret overhead has only emphlogarithmic dependence on $M$ as long
arXiv Detail & Related papers (2022-02-12T21:55:44Z) - Differentially Private Multi-Armed Bandits in the Shuffle Model [58.22098764071924]
We give an $(varepsilon,delta)$-differentially private algorithm for the multi-armed bandit (MAB) problem in the shuffle model.
Our upper bound almost matches the regret of the best known algorithms for the centralized model, and significantly outperforms the best known algorithm in the local model.
arXiv Detail & Related papers (2021-06-05T14:11:01Z) - Bandits with many optimal arms [68.17472536610859]
We write $p*$ for the proportion of optimal arms and $Delta$ for the minimal mean-gap between optimal and sub-optimal arms.
We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting.
arXiv Detail & Related papers (2021-03-23T11:02:31Z) - An Algorithm for Stochastic and Adversarial Bandits with Switching Costs [10.549307055348596]
We propose an algorithm for adversarial multiarmed bandits with switching costs, where the algorithm pays a price $lambda$ every time it switches the arm being played.
Our algorithm is based on adaptation of the Tsallis-INF algorithm of Zimmert and Seldin (2021) and requires no prior knowledge of the regime or time horizon.
arXiv Detail & Related papers (2021-02-19T11:03:51Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.