Related papers: Adversarial Linear Contextual Bandits with Graph-Structured Side Observations

Adversarial Linear Contextual Bandits with Graph-Structured Side Observations

URL: http://arxiv.org/abs/2012.05756v3
Date: Wed, 17 Feb 2021 01:58:52 GMT
Title: Adversarial Linear Contextual Bandits with Graph-Structured Side Observations
Authors: Lingda Wang, Bingcong Li, Huozhi Zhou, Georgios B. Giannakis, Lav R. Varshney, Zhizhen Zhao
Abstract summary: A learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector. The agent incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures. Two efficient algorithms are developed based on textttEXP3.
Score: 80.95090605985042
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies the adversarial graphical contextual bandits, a variant of adversarial multi-armed bandits that leverage two categories of the most common side information: \emph{contexts} and \emph{side observations}. In this setting, a learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector. The agent not only incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures, which are encoded as a series of feedback graphs. This setting models a variety of applications in social networks, where both contexts and graph-structured side observations are available. Two efficient algorithms are developed based on \texttt{EXP3}. Under mild conditions, our analysis shows that for undirected feedback graphs the first algorithm, \texttt{EXP3-LGC-U}, achieves the regret of order $\mathcal{O}(\sqrt{(K+\alpha(G)d)T\log{K}})$ over the time horizon $T$, where $\alpha(G)$ is the average \emph{independence number} of the feedback graphs. A slightly weaker result is presented for the directed graph setting as well. The second algorithm, \texttt{EXP3-LGC-IX}, is developed for a special class of problems, for which the regret is reduced to $\mathcal{O}(\sqrt{\alpha(G)dT\log{K}\log(KT)})$ for both directed as well as undirected feedback graphs. Numerical tests corroborate the efficiency of proposed algorithms.

Related papers

G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering [61.93058781222079]
We develop a flexible question-answering framework targeting real-world textual graphs. We introduce the first retrieval-augmented generation (RAG) approach for general textual graphs. G-Retriever performs RAG over a graph by formulating this task as a Prize-Collecting Steiner Tree optimization problem.
arXiv Detail & Related papers (2024-02-12T13:13:04Z)
Online Learning with Feedback Graphs: The True Shape of Regret [82.00098840619847]
We prove that the minimax regret is proportional to $R*$ for any graph and time horizon $T$. Introducing an intricate exploration strategy, we define the mainAlgorithm algorithm that achieves the minimax optimal regret bound.
arXiv Detail & Related papers (2023-06-05T15:35:00Z)
Stochastic Contextual Bandits with Graph-based Contexts [0.0]
We generalize the on-line graph prediction problem to a version of contextual bandit problems. Our algorithm relies on the optimal bandit algorithm by Zimmert and Seldin[AISTAT'19, JMLR'21].
arXiv Detail & Related papers (2023-05-02T14:51:35Z)
Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs [62.52390282012508]
We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds. We develop an algorithm that achieves the optimal regret $widetildemathcalO((sum_t=1Talpha_t)1/2+max_tin[T]alpha_t]$ with high probability. We also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs.
arXiv Detail & Related papers (2022-10-04T04:36:15Z)
Causal Bandits for Linear Structural Equation Models [58.2875460517691]
This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model. It is assumed that the graph's structure is known and has $N$ nodes. Two algorithms are proposed for the frequentist (UCB-based) and Bayesian settings.
arXiv Detail & Related papers (2022-08-26T16:21:31Z)
Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback [15.429356827868514]
We introduce a new trade-off mechanism for exploration and exploitation of general feedback graphs. We prove the proposed algorithm simultaneously achieves $mathrmpoly-designed log T$ regret in the adversarial setting. This is the first best-of-both-worlds result for general feedback graphs.
arXiv Detail & Related papers (2022-06-16T04:21:27Z)
Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs [34.37963000493442]
This study considers online learning with general directed feedback graphs. We present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments.
arXiv Detail & Related papers (2022-06-02T05:01:40Z)
A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs [21.563733343861713]
We consider online learning with feedback graphs, a sequential decision-making framework where the learner's feedback is determined by a directed graph over the action set. We present a computationally efficient algorithm for learning in this framework that simultaneously achieves near-optimal regret bounds in both and adversarial environments.
arXiv Detail & Related papers (2022-06-01T15:14:32Z)
Joint Graph Learning and Matching for Semantic Feature Correspondence [69.71998282148762]
We propose a joint emphgraph learning and matching network, named GLAM, to explore reliable graph structures for boosting graph matching. The proposed method is evaluated on three popular visual matching benchmarks (Pascal VOC, Willow Object and SPair-71k) It outperforms previous state-of-the-art graph matching methods by significant margins on all benchmarks.
arXiv Detail & Related papers (2021-09-01T08:24:02Z)
Random Graph Matching with Improved Noise Robustness [2.294014185517203]
We propose a new algorithm for graph matching under probabilistic models. Our algorithm recovers the underlying matching with high probability when $alpha le 1 / (log log n)C$. This improves the condition $alpha le 1 / (log n)C$ achieved in previous work.
arXiv Detail & Related papers (2021-01-28T02:39:27Z)
Dual ResGCN for Balanced Scene GraphGeneration [106.7828712878278]
We propose a novel model, dubbed textitdual ResGCN, which consists of an object residual graph convolutional network and a relation residual graph convolutional network. The two networks are complementary to each other. The former captures object-level context information, textiti.e., the connections among objects. The latter is carefully designed to explicitly capture relation-level context information textiti.e., the connections among relations.
arXiv Detail & Related papers (2020-11-09T07:44:17Z)
Inverse Graph Identification: Can We Identify Node Labels Given Graph Labels? [89.13567439679709]
Graph Identification (GI) has long been researched in graph learning and is essential in certain applications. This paper defines a novel problem dubbed Inverse Graph Identification (IGI) We propose a simple yet effective method that makes the node-level message passing process using Graph Attention Network (GAT) under the protocol of GI.
arXiv Detail & Related papers (2020-07-12T12:06:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.