An $\alpha$-No-Regret Algorithm For Graphical Bilinear Bandits
- URL: http://arxiv.org/abs/2206.00466v1
- Date: Wed, 1 Jun 2022 12:55:17 GMT
- Title: An $\alpha$-No-Regret Algorithm For Graphical Bilinear Bandits
- Authors: Geovani Rizk, Igor Colin, Albert Thomas, Rida Laraki, Yann Chevaleyre
- Abstract summary: We propose the first regret-based approach to the Graphical Bilinear Bandits problem.
We present the first regret-based algorithm for bilinear bandits using the principle of optimism in the face of uncertainty.
- Score: 15.29268368415036
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose the first regret-based approach to the Graphical Bilinear Bandits
problem, where $n$ agents in a graph play a stochastic bilinear bandit game
with each of their neighbors. This setting reveals a combinatorial NP-hard
problem that prevents the use of any existing regret-based algorithm in the
(bi-)linear bandit literature. In this paper, we fill this gap and present the
first regret-based algorithm for graphical bilinear bandits using the principle
of optimism in the face of uncertainty. Theoretical analysis of this new method
yields an upper bound of $\tilde{O}(\sqrt{T})$ on the $\alpha$-regret and
evidences the impact of the graph structure on the rate of convergence.
Finally, we show through various experiments the validity of our approach.
Related papers
- The graph alignment problem: fundamental limits and efficient algorithms [0.9246334723892301]
The noisy version of the graph isomorphism problem aims to find a matching between the nodes of two graphs which preserves most of the edges.
This thesis focuses on understanding the fundamental information-theoretical limits for this problem, as well as designing and analyzing algorithms that are able to recover the underlying alignment in the data.
arXiv Detail & Related papers (2024-04-18T15:31:13Z) - Stochastic Graph Bandit Learning with Side-Observations [4.910658441596583]
We propose an algorithm that adapts to both the underlying graph structures and reward gaps.
To the best of our knowledge, our algorithm is the first to provide a gap-dependent upper bound in this setting.
arXiv Detail & Related papers (2023-08-29T08:14:19Z) - Revisiting Weighted Strategy for Non-stationary Parametric Bandits [82.1942459195896]
This paper revisits the weighted strategy for non-stationary parametric bandits.
We propose a refined analysis framework, which produces a simpler weight-based algorithm.
Our new framework can be used to improve regret bounds of other parametric bandits.
arXiv Detail & Related papers (2023-03-05T15:11:14Z) - Improved Algorithms for Bandit with Graph Feedback via Regret
Decomposition [2.3034251503343466]
The problem of bandit with graph feedback generalizes both the multi-armed bandit (MAB) problem and the learning with expert advice problem.
We propose a new algorithmic framework for the problem based on a partition of the feedback graph.
arXiv Detail & Related papers (2022-05-30T13:07:42Z) - Learning Sparse Graph with Minimax Concave Penalty under Gaussian Markov
Random Fields [51.07460861448716]
This paper presents a convex-analytic framework to learn from data.
We show that a triangular convexity decomposition is guaranteed by a transform of the corresponding to its upper part.
arXiv Detail & Related papers (2021-09-17T17:46:12Z) - Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms.
Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z) - Best Arm Identification in Graphical Bilinear Bandits [9.052414772641011]
We introduce a new graphical bilinear bandit problem where a learner allocates arms to the nodes of a graph.
We study the best arm identification problem in which the learner wants to find the graph allocation maximizing the sum of the bilinear rewards.
arXiv Detail & Related papers (2020-12-14T15:25:23Z) - Adversarial Linear Contextual Bandits with Graph-Structured Side
Observations [80.95090605985042]
A learning agent repeatedly chooses from a set of $K$ actions after being presented with a $d$-dimensional context vector.
The agent incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures.
Two efficient algorithms are developed based on textttEXP3.
arXiv Detail & Related papers (2020-12-10T15:40:07Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Online Dense Subgraph Discovery via Blurred-Graph Feedback [87.9850024070244]
We introduce a novel learning problem for dense subgraph discovery.
We first propose a edge-time algorithm that obtains a nearly-optimal solution with high probability.
We then design a more scalable algorithm with a theoretical guarantee.
arXiv Detail & Related papers (2020-06-24T11:37:33Z) - Bandit algorithms to emulate human decision making using probabilistic
distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions.
We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.