Better Best of Both Worlds Bounds for Bandits with Switching Costs
- URL: http://arxiv.org/abs/2206.03098v1
- Date: Tue, 7 Jun 2022 08:22:56 GMT
- Title: Better Best of Both Worlds Bounds for Bandits with Switching Costs
- Authors: Idan Amir, Guy Azov, Tomer Koren, Roi Livni
- Abstract summary: We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer, Seldin and Cesa-Bianchi, 2021.
We introduce a surprisingly simple and effective that simultaneously achieves minimax optimal regret bound of $mathcalO(T2/3)$ in the oblivious adversarial setting.
- Score: 37.71741656687868
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study best-of-both-worlds algorithms for bandits with switching cost,
recently addressed by Rouyer, Seldin and Cesa-Bianchi, 2021. We introduce a
surprisingly simple and effective algorithm that simultaneously achieves
minimax optimal regret bound of $\mathcal{O}(T^{2/3})$ in the oblivious
adversarial setting and a bound of $\mathcal{O}(\min\{\log
(T)/\Delta^2,T^{2/3}\})$ in the stochastically-constrained regime, both with
(unit) switching costs, where $\Delta$ is the gap between the arms. In the
stochastically constrained case, our bound improves over previous results due
to Rouyer et al., that achieved regret of $\mathcal{O}(T^{1/3}/\Delta)$. We
accompany our results with a lower bound showing that, in general,
$\tilde{\Omega}(\min\{1/\Delta^2,T^{2/3}\})$ regret is unavoidable in the
stochastically-constrained case for algorithms with $\mathcal{O}(T^{2/3})$
worst-case regret.
Related papers
- Adversarial Combinatorial Bandits with Switching Costs [55.2480439325792]
We study the problem of adversarial bandit with a switching cost $lambda$ for a switch of each selected arm in each round.
We derive lower bounds for the minimax regret and design algorithms to approach them.
arXiv Detail & Related papers (2024-04-02T12:15:37Z) - Improved Regret for Bandit Convex Optimization with Delayed Feedback [50.46856739179311]
bandit convex optimization (BCO) with delayed feedback, where only the loss value of the action is revealed under a delay.
We develop a novel algorithm, and prove that it enjoys a regret bound of $O(sqrtnT3/4+sqrtdT)$ in general.
We show that the proposed algorithm can improve the regret bound to $O((nT)2/3log/3T+dlog T)$ for strongly convex functions.
arXiv Detail & Related papers (2024-02-14T13:08:26Z) - Best-of-Both-Worlds Algorithms for Linear Contextual Bandits [11.94312915280916]
We study best-of-both-worlds algorithms for $K$-armed linear contextual bandits.
Our algorithms deliver near-optimal regret bounds in both the adversarial and adversarial regimes.
arXiv Detail & Related papers (2023-12-24T08:27:30Z) - Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback [30.337826346496385]
We study online reinforcement learning in linear Markov decision processes with adversarial losses and bandit feedback.
We introduce two algorithms that achieve improved regret performance compared to existing approaches.
arXiv Detail & Related papers (2023-10-17T19:43:37Z) - Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits [3.5955736977697073]
In the single-pass setting with $K$ arms and $T$ trials, a regret lower bound of $Omega(T2/3)$ has been proved for any algorithm with $o(K)$ memory.
In this paper, we improve the regret lower bound to $Omega(K/3log/3(T))$ for algorithms with $o(K)$ memory.
We show that the proposed algorithms consistently outperform the benchmark uniform exploration algorithm by a large margin, and on occasion, reduce the regret by up to 70%.
arXiv Detail & Related papers (2023-06-03T22:41:44Z) - Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score.
We propose a rich class of generalized linear dueling bandit models, which cover many existing models.
Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z) - Online Convex Optimization with Continuous Switching Constraint [78.25064451417082]
We introduce the problem of online convex optimization with continuous switching constraint.
We show that, for strongly convex functions, the regret bound can be improved to $O(log T)$ for $S=Omega(log T)$, and $O(minT/exp(S)+S,T)$ for $S=O(log T)$.
arXiv Detail & Related papers (2021-03-21T11:43:35Z) - An Algorithm for Stochastic and Adversarial Bandits with Switching Costs [10.549307055348596]
We propose an algorithm for adversarial multiarmed bandits with switching costs, where the algorithm pays a price $lambda$ every time it switches the arm being played.
Our algorithm is based on adaptation of the Tsallis-INF algorithm of Zimmert and Seldin (2021) and requires no prior knowledge of the regime or time horizon.
arXiv Detail & Related papers (2021-02-19T11:03:51Z) - Adversarial Dueling Bandits [85.14061196945599]
We introduce the problem of regret in Adversarial Dueling Bandits.
The learner has to repeatedly choose a pair of items and observe only a relative binary win-loss' feedback for this pair.
Our main result is an algorithm whose $T$-round regret compared to the emphBorda-winner from a set of $K$ items.
arXiv Detail & Related papers (2020-10-27T19:09:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.