Related papers: Adversarial Combinatorial Bandits with Switching Costs

Adversarial Combinatorial Bandits with Switching Costs

URL: http://arxiv.org/abs/2404.01883v1
Date: Tue, 2 Apr 2024 12:15:37 GMT
Title: Adversarial Combinatorial Bandits with Switching Costs
Authors: Yanyan Dong, Vincent Y. F. Tan,
Abstract summary: We study the problem of adversarial bandit with a switching cost $lambda$ for a switch of each selected arm in each round. We derive lower bounds for the minimax regret and design algorithms to approach them.
Score: 55.2480439325792
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the problem of adversarial combinatorial bandit with a switching cost $\lambda$ for a switch of each selected arm in each round, considering both the bandit feedback and semi-bandit feedback settings. In the oblivious adversarial case with $K$ base arms and time horizon $T$, we derive lower bounds for the minimax regret and design algorithms to approach them. To prove these lower bounds, we design stochastic loss sequences for both feedback settings, building on an idea from previous work in Dekel et al. (2014). The lower bound for bandit feedback is $ \tilde{\Omega}\big( (\lambda K)^{\frac{1}{3}} (TI)^{\frac{2}{3}}\big)$ while that for semi-bandit feedback is $ \tilde{\Omega}\big( (\lambda K I)^{\frac{1}{3}} T^{\frac{2}{3}}\big)$ where $I$ is the number of base arms in the combinatorial arm played in each round. To approach these lower bounds, we design algorithms that operate in batches by dividing the time horizon into batches to restrict the number of switches between actions. For the bandit feedback setting, where only the total loss of the combinatorial arm is observed, we introduce the Batched-Exp2 algorithm which achieves a regret upper bound of $\tilde{O}\big((\lambda K)^{\frac{1}{3}}T^{\frac{2}{3}}I^{\frac{4}{3}}\big)$ as $T$ tends to infinity. In the semi-bandit feedback setting, where all losses for the combinatorial arm are observed, we propose the Batched-BROAD algorithm which achieves a regret upper bound of $\tilde{O}\big( (\lambda K)^{\frac{1}{3}} (TI)^{\frac{2}{3}}\big)$.

Related papers

Adversarial Multi-dueling Bandits [0.4467766778351321]
We introduce the problem of regret in adversarial multi-dueling bandits. We introduce a novel algorithm, MiDEX (Multi Dueling EXP3), to learn from such preference feedback.
arXiv Detail & Related papers (2024-06-18T10:28:12Z)
Improved Regret for Bandit Convex Optimization with Delayed Feedback [50.46856739179311]
bandit convex optimization (BCO) with delayed feedback, where only the loss value of the action is revealed under a delay. We develop a novel algorithm, and prove that it enjoys a regret bound of $O(sqrtnT3/4+sqrtdT)$ in general. We show that the proposed algorithm can improve the regret bound to $O((nT)2/3log/3T+dlog T)$ for strongly convex functions.
arXiv Detail & Related papers (2024-02-14T13:08:26Z)
Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed. SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z)
Combinatorial Neural Bandits [10.463365653675694]
We consider a contextual bandit problem where in each round a learning agent selects a subset of arms and receives feedback on the selected arms according to their scores. We propose algorithms: Combinatorial Neural UCB ($textttCN-UCB) and Combinatorial Thompson Sampling ($textttCN-TS$)
arXiv Detail & Related papers (2023-05-31T23:27:58Z)
Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score. We propose a rich class of generalized linear dueling bandit models, which cover many existing models. Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z)
Better Best of Both Worlds Bounds for Bandits with Switching Costs [37.71741656687868]
We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer, Seldin and Cesa-Bianchi, 2021. We introduce a surprisingly simple and effective that simultaneously achieves minimax optimal regret bound of $mathcalO(T2/3)$ in the oblivious adversarial setting.
arXiv Detail & Related papers (2022-06-07T08:22:56Z)
Rotting Infinitely Many-armed Bandits [33.26370721280417]
We show an $Omega(maxvarrho2/3T,sqrtT)$ worst-case regret lower bound where $T$ is the horizon time. An algorithm that does not know the value of $varrho$ can be achieved by using an adaptive UCB index along with an adaptive threshold value.
arXiv Detail & Related papers (2022-01-31T03:07:17Z)
Combinatorial Bandits without Total Order for Arms [52.93972547896022]
We present a reward model that captures set-dependent reward distribution and assumes no total order for arms. We develop a novel regret analysis and show an $Oleft(frack2 n log Tepsilonright)$ gap-dependent regret bound as well as an $Oleft(k2sqrtn T log Tright)$ gap-independent regret bound.
arXiv Detail & Related papers (2021-03-03T23:08:59Z)
An Algorithm for Stochastic and Adversarial Bandits with Switching Costs [10.549307055348596]
We propose an algorithm for adversarial multiarmed bandits with switching costs, where the algorithm pays a price $lambda$ every time it switches the arm being played. Our algorithm is based on adaptation of the Tsallis-INF algorithm of Zimmert and Seldin (2021) and requires no prior knowledge of the regime or time horizon.
arXiv Detail & Related papers (2021-02-19T11:03:51Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
Adversarial Dueling Bandits [85.14061196945599]
We introduce the problem of regret in Adversarial Dueling Bandits. The learner has to repeatedly choose a pair of items and observe only a relative binary win-loss' feedback for this pair. Our main result is an algorithm whose $T$-round regret compared to the emphBorda-winner from a set of $K$ items.
arXiv Detail & Related papers (2020-10-27T19:09:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.