Related papers: Graph-Dependent Regret Bounds in Multi-Armed Bandits with Interference

Graph-Dependent Regret Bounds in Multi-Armed Bandits with Interference

URL: http://arxiv.org/abs/2503.07555v1
Date: Mon, 10 Mar 2025 17:25:24 GMT
Title: Graph-Dependent Regret Bounds in Multi-Armed Bandits with Interference
Authors: Fateme Jamshidi, Mohammad Shahverdikondori, Negar Kiyavash,
Abstract summary: Multi-armed bandits (MABs) are frequently used for online sequential decision-making in applications ranging from recommending personalized content to assigning treatments to patients.<n>We study the MAB problem under network interference, where each unit's reward depends on its own treatment and those of its neighbors in a given interference graph.<n>We propose a novel algorithm that uses the local structure of the interference graph to minimize regret.
Score: 18.101059380671582
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-armed bandits (MABs) are frequently used for online sequential decision-making in applications ranging from recommending personalized content to assigning treatments to patients. A recurring challenge in the applicability of the classic MAB framework to real-world settings is ignoring \textit{interference}, where a unit's outcome depends on treatment assigned to others. This leads to an exponentially growing action space, rendering standard approaches computationally impractical. We study the MAB problem under network interference, where each unit's reward depends on its own treatment and those of its neighbors in a given interference graph. We propose a novel algorithm that uses the local structure of the interference graph to minimize regret. We derive a graph-dependent upper bound on cumulative regret showing that it improves over prior work. Additionally, we provide the first lower bounds for bandits with arbitrary network interference, where each bound involves a distinct structural property of the interference graph. These bounds demonstrate that when the graph is either dense or sparse, our algorithm is nearly optimal, with upper and lower bounds that match up to logarithmic factors. We complement our theoretical results with numerical experiments, which show that our approach outperforms baseline methods.

Related papers

Influential Bandits: Pulling an Arm May Change the Environment [44.71145269686588]
Real-world applications often involve non-stationary environments and interdependencies between arms. We propose the influential bandit problem, which models inter-arm interactions through an unknown, symmetric, positive semi-definite interaction matrix. We introduce a new algorithm based on a lower confidence bound (LCB) estimator tailored to the structure of the loss dynamics.
arXiv Detail & Related papers (2025-04-11T02:05:51Z)
Causal bandits with backdoor adjustment on unknown Gaussian DAGs [5.807183284468881]
We study the causal bandit problem when the graph structure is unknown.<n>We identify backdoor adjustment sets for each arm using sequentially generated experimental and observational data.<n>We develop a novel bandit algorithm, based on modified upper confidence bounds, to sequentially determine the optimal intervention.
arXiv Detail & Related papers (2025-02-04T05:18:35Z)
Influence Maximization via Graph Neural Bandits [54.45552721334886]
We set the IM problem in a multi-round diffusion campaign, aiming to maximize the number of distinct users that are influenced. We propose the framework IM-GNB (Influence Maximization with Graph Neural Bandits), where we provide an estimate of the users' probabilities of being influenced.
arXiv Detail & Related papers (2024-06-18T17:54:33Z)
Multi-Armed Bandits with Network Interference [5.708360037632367]
We study a multi-armed bandit (MAB) problem where a learner sequentially assigns one of possible $mathcalA$ actions (discounts) to $N$ units (goods) over $T$ rounds to minimize regret. Unlike traditional MAB problems, the reward of each unit depends on the treatments assigned to other units, i.e., there is interference across the underlying network of units. We propose simple, linear regression-based algorithms to minimize regret.
arXiv Detail & Related papers (2024-05-28T22:01:50Z)
An Effective Branch-and-Bound Algorithm with New Bounding Methods for the Maximum $s$-Bundle Problem [9.400610985728441]
The Maximum s-Bundle Problem (MBP) addresses the task of identifying a maximum s-bundle in a given graph. We introduce a novel Partition-based Upper Bound (PUB) that leverages the graph partitioning technique to achieve a tighter upper bound. We also propose a new BnB algorithm that uses the initial lower bound and PUB in preprocessing for graph reduction.
arXiv Detail & Related papers (2024-02-06T06:05:11Z)
Graph Signal Sampling for Inductive One-Bit Matrix Completion: a Closed-form Solution [112.3443939502313]
We propose a unified graph signal sampling framework which enjoys the benefits of graph signal analysis and processing. The key idea is to transform each user's ratings on the items to a function (signal) on the vertices of an item-item graph. For the online setting, we develop a Bayesian extension, i.e., BGS-IMC which considers continuous random Gaussian noise in the graph Fourier domain.
arXiv Detail & Related papers (2023-02-08T08:17:43Z)
Local Graph-homomorphic Processing for Privatized Distributed Systems [57.14673504239551]
We show that the added noise does not affect the performance of the learned model. This is a significant improvement to previous works on differential privacy for distributed algorithms.
arXiv Detail & Related papers (2022-10-26T10:00:14Z)
Inverse Boundary Value and Optimal Control Problems on Graphs: A Neural and Numerical Synthesis [0.0]
A key piece in the present architecture is our boundary injected message passing neural network. A regularization technique based on graphical distance is introduced that helps with stabilizing the predictions at nodes far from the boundary.
arXiv Detail & Related papers (2022-06-06T21:26:23Z)
Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback. It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z)
Efficient Algorithms for Finite Horizon and Streaming Restless Multi-Armed Bandit Problems [30.759279275710078]
We propose a new and scalable approach to computing index-based solutions. We provide algorithms designed to capture index decay without having to solve the costly finite horizon problem. Our algorithms achieve an over 150x speed-up over existing methods in these tasks without loss in performance.
arXiv Detail & Related papers (2021-03-08T13:10:31Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
An Integer Linear Programming Framework for Mining Constraints from Data [81.60135973848125]
We present a general framework for mining constraints from data. In particular, we consider the inference in structured output prediction as an integer linear programming (ILP) problem. We show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules.
arXiv Detail & Related papers (2020-06-18T20:09:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.