Related papers: Understanding Bandits with Graph Feedback

Understanding Bandits with Graph Feedback

URL: http://arxiv.org/abs/2105.14260v1
Date: Sat, 29 May 2021 09:35:28 GMT
Title: Understanding Bandits with Graph Feedback
Authors: Houshuang Chen (1), Zengfeng Huang (2), Shuai Li (1) and Chihao Zhang (1) ((1) Shanghai Jiao Tong University, (2) Fudan University)
Abstract summary: We prove a general regret upper bound $Oleft(left(delta*log |V|right)frac13Tfrac23right)$ and a lower bound $Omegaleft(left(delta*/alpharight)frac13Tfrac23right)$. We show that for several special families of graphs, we can get rid of the $left(log |V|right)frac13$ factor
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The bandit problem with graph feedback, proposed in [Mannor and Shamir, NeurIPS 2011], is modeled by a directed graph $G=(V,E)$ where $V$ is the collection of bandit arms, and once an arm is triggered, all its incident arms are observed. A fundamental question is how the structure of the graph affects the min-max regret. We propose the notions of the fractional weak domination number $\delta^*$ and the $k$-packing independence number capturing upper bound and lower bound for the regret respectively. We show that the two notions are inherently connected via aligning them with the linear program of the weakly dominating set and its dual -- the fractional vertex packing set respectively. Based on this connection, we utilize the strong duality theorem to prove a general regret upper bound $O\left(\left( \delta^*\log |V|\right)^{\frac{1}{3}}T^{\frac{2}{3}}\right)$ and a lower bound $\Omega\left(\left(\delta^*/\alpha\right)^{\frac{1}{3}}T^{\frac{2}{3}}\right)$ where $\alpha$ is the integrality gap of the dual linear program. Therefore, our bounds are tight up to a $\left(\log |V|\right)^{\frac{1}{3}}$ factor on graphs with bounded integrality gap for the vertex packing problem including trees and graphs with bounded degree. Moreover, we show that for several special families of graphs, we can get rid of the $\left(\log |V|\right)^{\frac{1}{3}}$ factor and establish optimal regret.

Related papers

Preparing graph states forbidding a vertex-minor [1.864621482724548]
Measurement based quantum computing is preformed by adding non-Clifford measurements to a prepared stabilizer states. Every stabilizer state is local-Clifford equivalent to a graph state, so we may focus on graph states $leftvert G rightrangle$. We obtain significantly improved bounds when $G$ is contained within certain proper classes of graphs.
arXiv Detail & Related papers (2025-03-31T23:25:35Z)
Adversarial Combinatorial Bandits with Switching Costs [55.2480439325792]
We study the problem of adversarial bandit with a switching cost $lambda$ for a switch of each selected arm in each round. We derive lower bounds for the minimax regret and design algorithms to approach them.
arXiv Detail & Related papers (2024-04-02T12:15:37Z)
On Interpolating Experts and Multi-Armed Bandits [1.9497955504539408]
We prove tight minimax regret bounds for $mathbfm$-MAB and design an optimal PAC algorithm for its pure exploration version, $mathbfm$-BAI. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.
arXiv Detail & Related papers (2023-07-14T10:38:30Z)
Online Learning with Feedback Graphs: The True Shape of Regret [82.00098840619847]
We prove that the minimax regret is proportional to $R*$ for any graph and time horizon $T$. Introducing an intricate exploration strategy, we define the mainAlgorithm algorithm that achieves the minimax optimal regret bound.
arXiv Detail & Related papers (2023-06-05T15:35:00Z)
Stochastic Contextual Bandits with Graph-based Contexts [0.0]
We generalize the on-line graph prediction problem to a version of contextual bandit problems. Our algorithm relies on the optimal bandit algorithm by Zimmert and Seldin[AISTAT'19, JMLR'21].
arXiv Detail & Related papers (2023-05-02T14:51:35Z)
Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs [62.52390282012508]
We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds. We develop an algorithm that achieves the optimal regret $widetildemathcalO((sum_t=1Talpha_t)1/2+max_tin[T]alpha_t]$ with high probability. We also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs.
arXiv Detail & Related papers (2022-10-04T04:36:15Z)
Causal Bandits for Linear Structural Equation Models [58.2875460517691]
This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model. It is assumed that the graph's structure is known and has $N$ nodes. Two algorithms are proposed for the frequentist (UCB-based) and Bayesian settings.
arXiv Detail & Related papers (2022-08-26T16:21:31Z)
Laplacian State Transfer on Graphs with an Edge Perturbation Between Twin Vertices [0.0]
We consider quantum state transfer relative to the Laplacian matrix of a graph. We investigate the existence of quantum state transfer between a pair of twin vertices in a graph when the edge between the vertices is perturbed.
arXiv Detail & Related papers (2021-09-11T15:48:18Z)
Combinatorial Bandits without Total Order for Arms [52.93972547896022]
We present a reward model that captures set-dependent reward distribution and assumes no total order for arms. We develop a novel regret analysis and show an $Oleft(frack2 n log Tepsilonright)$ gap-dependent regret bound as well as an $Oleft(k2sqrtn T log Tright)$ gap-independent regret bound.
arXiv Detail & Related papers (2021-03-03T23:08:59Z)
Adversarial Dueling Bandits [85.14061196945599]
We introduce the problem of regret in Adversarial Dueling Bandits. The learner has to repeatedly choose a pair of items and observe only a relative binary win-loss' feedback for this pair. Our main result is an algorithm whose $T$-round regret compared to the emphBorda-winner from a set of $K$ items.
arXiv Detail & Related papers (2020-10-27T19:09:08Z)
A Closer Look at Small-loss Bounds for Bandits with Graph Feedback [39.78074016649885]
We study small-loss bounds for adversarial multi-armed bandits with graph feedback. We derive the first small-loss bound for general strongly observable graphs. We also take the first attempt at deriving small-loss bounds for weakly observable graphs.
arXiv Detail & Related papers (2020-02-02T03:48:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.