Graph Feedback Bandits on Similar Arms: With and Without Graph Structures
- URL: http://arxiv.org/abs/2501.14314v1
- Date: Fri, 24 Jan 2025 08:15:45 GMT
- Title: Graph Feedback Bandits on Similar Arms: With and Without Graph Structures
- Authors: Han Qi, Fei Guo, Li Zhu, Qiaosheng Zhang, Xuelong Li,
- Abstract summary: We study the multi-armed bandit problem with graph feedback.
We introduce two upper confidence bound (UCB)-based algorithms.
We extend these two UCB-based algorithms to the ballooning setting.
- Score: 49.02207254933986
- License:
- Abstract: In this paper, we study the stochastic multi-armed bandit problem with graph feedback. Motivated by applications in clinical trials and recommendation systems, we assume that two arms are connected if and only if they are similar (i.e., their means are close to each other). We establish a regret lower bound for this problem under the novel feedback structure and introduce two upper confidence bound (UCB)-based algorithms: Double-UCB, which has problem-independent regret upper bounds, and Conservative-UCB, which has problem-dependent upper bounds. Leveraging the similarity structure, we also explore a scenario where the number of arms increases over time (referred to as the \emph{ballooning setting}). Practical applications of this scenario include Q\&A platforms (e.g., Reddit, Stack Overflow, Quora) and product reviews on platforms like Amazon and Flipkart, where answers (or reviews) continuously appear, and the goal is to display the best ones at the top. We extend these two UCB-based algorithms to the ballooning setting. Under mild assumptions, we provide regret upper bounds for both algorithms and discuss their sub-linearity. Furthermore, we propose a new version of the corresponding algorithms that do not rely on prior knowledge of the graph's structural information and provide regret upper bounds. Finally, we conduct experiments to validate the theoretical results.
Related papers
- PageRank Bandits for Link Prediction [72.61386754332776]
Link prediction is a critical problem in graph learning with broad applications such as recommender systems and knowledge graph completion.
This paper reformulates link prediction as a sequential decision-making process, where each link prediction interaction occurs sequentially.
We propose a novel fusion algorithm, PRB (PageRank Bandits), which is the first to combine contextual bandits with PageRank for collaborative exploitation and exploration.
arXiv Detail & Related papers (2024-11-03T02:39:28Z) - Graph Feedback Bandits with Similar Arms [9.701475722399646]
We study the multi-armed bandit problem with graph feedback.
We introduce two UCB-based algorithms: D-UCB with problem-independent regret upper bounds and C-UCB with problem-dependent upper bounds.
arXiv Detail & Related papers (2024-05-18T04:20:14Z) - Problem Dependent View on Structured Thresholding Bandit Problems [73.70176003598449]
We investigate the problem dependent regime in the Thresholding Bandit problem (TBP)
The objective of the learner is to output, at the end of a sequential game, the set of arms whose means are above a given threshold.
We provide upper and lower bounds for the probability of error in both the concave and monotone settings.
arXiv Detail & Related papers (2021-06-18T15:01:01Z) - Beyond Bandit Feedback in Online Multiclass Classification [17.07011090727996]
We study the problem of online multiclass classification in a setting where the learner's feedback is determined by an arbitrary directed graph.
We introduce Gappletron, the first online multiclass algorithm that works with arbitrary feedback graphs.
arXiv Detail & Related papers (2021-06-07T13:22:30Z) - Improved Branch and Bound for Neural Network Verification via Lagrangian
Decomposition [161.09660864941603]
We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks.
We present a novel activation-based branching strategy and a BaB framework, named Branch and Dual Network Bound (BaDNB)
BaDNB outperforms previous complete verification systems by a large margin, cutting average verification times by factors up to 50 on adversarial properties.
arXiv Detail & Related papers (2021-04-14T09:22:42Z) - Adversarial Linear Contextual Bandits with Graph-Structured Side
Observations [80.95090605985042]
A learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector.
The agent incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures.
Two efficient algorithms are developed based on textttEXP3.
arXiv Detail & Related papers (2020-12-10T15:40:07Z) - Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take.
We propose a novel approach based on Lagrangian Decomposition.
We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z) - Pair-Matching: Links Prediction with Adaptive Queries [7.22341371511072]
We show that sublinear regret is achievable in the case where the graph is generated according to a Block Model (SBM) with two communities.
The paper is concluded by a conjecture regarding the optimal regret when the number of communities is larger than 2.
arXiv Detail & Related papers (2019-05-17T15:57:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.