Adversarial Combinatorial Semi-bandits with Graph Feedback
- URL: http://arxiv.org/abs/2502.18826v2
- Date: Fri, 28 Feb 2025 00:20:14 GMT
- Title: Adversarial Combinatorial Semi-bandits with Graph Feedback
- Authors: Yuxiao Wen,
- Abstract summary: We show that the optimal regret over a time horizon $T$ scales as $widetildeTheta(SsqrtT+sqrtalpha ST)$.<n>A key technical ingredient is to realize a convexified action using a random decision vector with negative correlations.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In combinatorial semi-bandits, a learner repeatedly selects from a combinatorial decision set of arms, receives the realized sum of rewards, and observes the rewards of the individual selected arms as feedback. In this paper, we extend this framework to include \emph{graph feedback}, where the learner observes the rewards of all neighboring arms of the selected arms in a feedback graph $G$. We establish that the optimal regret over a time horizon $T$ scales as $\widetilde{\Theta}(S\sqrt{T}+\sqrt{\alpha ST})$, where $S$ is the size of the combinatorial decisions and $\alpha$ is the independence number of $G$. This result interpolates between the known regrets $\widetilde\Theta(S\sqrt{T})$ under full information (i.e., $G$ is complete) and $\widetilde\Theta(\sqrt{KST})$ under the semi-bandit feedback (i.e., $G$ has only self-loops), where $K$ is the total number of arms. A key technical ingredient is to realize a convexified action using a random decision vector with negative correlations.
Related papers
- Multi-objective Good Arm Identification with Bandit Feedback [0.589889361990138]
We consider a good arm identification problem in a bandit setting with multi-objectives.
For each round $t$, the player/algorithm pulls one arm $i_t$ and receives a $M$ dimensional vector feedback sampled according to $mathcalD_i_t$.
The proposed algorithm attains better numerical performance than other baselines in the experiments on synthetic and real datasets.
arXiv Detail & Related papers (2025-03-13T14:04:04Z) - Adversarial Combinatorial Bandits with Switching Costs [55.2480439325792]
We study the problem of adversarial bandit with a switching cost $lambda$ for a switch of each selected arm in each round.
We derive lower bounds for the minimax regret and design algorithms to approach them.
arXiv Detail & Related papers (2024-04-02T12:15:37Z) - On Interpolating Experts and Multi-Armed Bandits [1.9497955504539408]
We prove tight minimax regret bounds for $mathbfm$-MAB and design an optimal PAC algorithm for its pure exploration version, $mathbfm$-BAI.
As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.
arXiv Detail & Related papers (2023-07-14T10:38:30Z) - Combinatorial Bandits for Maximum Value Reward Function under Max
Value-Index Feedback [9.771002043127728]
We consider a multi-armed bandit problem for maximum value reward function under maximum value and index feedback.
We propose an algorithm and provide a regret bound for problem instances with arm outcomes according to arbitrary distributions with finite supports.
Our algorithm achieves a $O((k/Delta)log(T))$ distribution-dependent and a $tildeO(sqrtT)$ distribution-independent regret.
arXiv Detail & Related papers (2023-05-25T14:02:12Z) - Top $K$ Ranking for Multi-Armed Bandit with Noisy Evaluations [102.32996053572144]
We consider a multi-armed bandit setting where, at the beginning of each round, the learner receives noisy independent evaluations of the true reward of each arm.
We derive different algorithmic approaches and theoretical guarantees depending on how the evaluations are generated.
arXiv Detail & Related papers (2021-12-13T09:48:54Z) - Combinatorial Bandits without Total Order for Arms [52.93972547896022]
We present a reward model that captures set-dependent reward distribution and assumes no total order for arms.
We develop a novel regret analysis and show an $Oleft(frack2 n log Tepsilonright)$ gap-dependent regret bound as well as an $Oleft(k2sqrtn T log Tright)$ gap-independent regret bound.
arXiv Detail & Related papers (2021-03-03T23:08:59Z) - Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous.
We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy.
We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z) - Adversarial Dueling Bandits [85.14061196945599]
We introduce the problem of regret in Adversarial Dueling Bandits.
The learner has to repeatedly choose a pair of items and observe only a relative binary win-loss' feedback for this pair.
Our main result is an algorithm whose $T$-round regret compared to the emphBorda-winner from a set of $K$ items.
arXiv Detail & Related papers (2020-10-27T19:09:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.