Adversarial Linear Contextual Bandits with Graph-Structured Side
Observations
- URL: http://arxiv.org/abs/2012.05756v3
- Date: Wed, 17 Feb 2021 01:58:52 GMT
- Title: Adversarial Linear Contextual Bandits with Graph-Structured Side
Observations
- Authors: Lingda Wang, Bingcong Li, Huozhi Zhou, Georgios B. Giannakis, Lav R.
Varshney, Zhizhen Zhao
- Abstract summary: A learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector.
The agent incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures.
Two efficient algorithms are developed based on textttEXP3.
- Score: 80.95090605985042
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the adversarial graphical contextual bandits, a variant of
adversarial multi-armed bandits that leverage two categories of the most common
side information: \emph{contexts} and \emph{side observations}. In this
setting, a learning agent repeatedly chooses from a set of $K$ actions after
being presented with a $d$-dimensional context vector. The agent not only
incurs and observes the loss of the chosen action, but also observes the losses
of its neighboring actions in the observation structures, which are encoded as
a series of feedback graphs. This setting models a variety of applications in
social networks, where both contexts and graph-structured side observations are
available. Two efficient algorithms are developed based on \texttt{EXP3}. Under
mild conditions, our analysis shows that for undirected feedback graphs the
first algorithm, \texttt{EXP3-LGC-U}, achieves the regret of order
$\mathcal{O}(\sqrt{(K+\alpha(G)d)T\log{K}})$ over the time horizon $T$, where
$\alpha(G)$ is the average \emph{independence number} of the feedback graphs. A
slightly weaker result is presented for the directed graph setting as well. The
second algorithm, \texttt{EXP3-LGC-IX}, is developed for a special class of
problems, for which the regret is reduced to
$\mathcal{O}(\sqrt{\alpha(G)dT\log{K}\log(KT)})$ for both directed as well as
undirected feedback graphs. Numerical tests corroborate the efficiency of
proposed algorithms.
Related papers
- G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering [61.93058781222079]
We develop a flexible question-answering framework targeting real-world textual graphs.
We introduce the first retrieval-augmented generation (RAG) approach for general textual graphs.
G-Retriever performs RAG over a graph by formulating this task as a Prize-Collecting Steiner Tree optimization problem.
arXiv Detail & Related papers (2024-02-12T13:13:04Z) - Online Learning with Feedback Graphs: The True Shape of Regret [82.00098840619847]
We prove that the minimax regret is proportional to $R*$ for any graph and time horizon $T$.
Introducing an intricate exploration strategy, we define the mainAlgorithm algorithm that achieves the minimax optimal regret bound.
arXiv Detail & Related papers (2023-06-05T15:35:00Z) - Stochastic Contextual Bandits with Graph-based Contexts [0.0]
We generalize the on-line graph prediction problem to a version of contextual bandit problems.
Our algorithm relies on the optimal bandit algorithm by Zimmert and Seldin[AISTAT'19, JMLR'21].
arXiv Detail & Related papers (2023-05-02T14:51:35Z) - Improved High-Probability Regret for Adversarial Bandits with
Time-Varying Feedback Graphs [62.52390282012508]
We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds.
We develop an algorithm that achieves the optimal regret $widetildemathcalO((sum_t=1Talpha_t)1/2+max_tin[T]alpha_t]$ with high probability.
We also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs.
arXiv Detail & Related papers (2022-10-04T04:36:15Z) - Simultaneously Learning Stochastic and Adversarial Bandits with General
Graph Feedback [15.429356827868514]
We introduce a new trade-off mechanism for exploration and exploitation of general feedback graphs.
We prove the proposed algorithm simultaneously achieves $mathrmpoly-designed log T$ regret in the adversarial setting.
This is the first best-of-both-worlds result for general feedback graphs.
arXiv Detail & Related papers (2022-06-16T04:21:27Z) - Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with
Feedback Graphs [34.37963000493442]
This study considers online learning with general directed feedback graphs.
We present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments.
arXiv Detail & Related papers (2022-06-02T05:01:40Z) - A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with
Feedback Graphs [21.563733343861713]
We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set.
We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both and adversarial environments.
arXiv Detail & Related papers (2022-06-01T15:14:32Z) - Joint Graph Learning and Matching for Semantic Feature Correspondence [69.71998282148762]
We propose a joint emphgraph learning and matching network, named GLAM, to explore reliable graph structures for boosting graph matching.
The proposed method is evaluated on three popular visual matching benchmarks (Pascal VOC, Willow Object and SPair-71k)
It outperforms previous state-of-the-art graph matching methods by significant margins on all benchmarks.
arXiv Detail & Related papers (2021-09-01T08:24:02Z) - Random Graph Matching with Improved Noise Robustness [2.294014185517203]
We propose a new algorithm for graph matching under probabilistic models.
Our algorithm recovers the underlying matching with high probability when $alpha le 1 / (log log n)C$.
This improves the condition $alpha le 1 / (log n)C$ achieved in previous work.
arXiv Detail & Related papers (2021-01-28T02:39:27Z) - Dual ResGCN for Balanced Scene GraphGeneration [106.7828712878278]
We propose a novel model, dubbed textitdual ResGCN, which consists of an object residual graph convolutional network and a relation residual graph convolutional network.
The two networks are complementary to each other. The former captures object-level context information, textiti.e., the connections among objects.
The latter is carefully designed to explicitly capture relation-level context information textiti.e., the connections among relations.
arXiv Detail & Related papers (2020-11-09T07:44:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.