Related papers: Networked Restless Bandits with Positive Externalities

Networked Restless Bandits with Positive Externalities

URL: http://arxiv.org/abs/2212.05144v1
Date: Fri, 9 Dec 2022 23:37:14 GMT
Title: Networked Restless Bandits with Positive Externalities
Authors: Christine Herlihy, John P. Dickerson
Abstract summary: We introduce networked restless bandits, a novel multi-armed bandit setting in which arms are both restless and embedded within a directed graph. We then present Greta, a graph-aware, Whittle index-based algorithm that can be used to efficiently construct a constrained reward-maximizing action vector at each timestep.
Score: 34.792869761921565
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Restless multi-armed bandits are often used to model budget-constrained resource allocation tasks where receipt of the resource is associated with an increased probability of a favorable state transition. Prior work assumes that individual arms only benefit if they receive the resource directly. However, many allocation tasks occur within communities and can be characterized by positive externalities that allow arms to derive partial benefit when their neighbor(s) receive the resource. We thus introduce networked restless bandits, a novel multi-armed bandit setting in which arms are both restless and embedded within a directed graph. We then present Greta, a graph-aware, Whittle index-based heuristic algorithm that can be used to efficiently construct a constrained reward-maximizing action vector at each timestep. Our empirical results demonstrate that Greta outperforms comparison policies across a range of hyperparameter values and graph topologies.

Related papers

Lagrangian Relaxation for Multi-Action Partially Observable Restless Bandits: Heuristic Policies and Indexability [0.0]
We study multi-action partially observable restless multi-armed bandits.<n>The state of a bandit is not observable but one of finitely many feedback signals are observable.<n>The computation of optimal value functions for finite-state, finite-action POMDPs is non-trivial.
arXiv Detail & Related papers (2025-08-30T08:47:33Z)
Quick-Draw Bandits: Quickly Optimizing in Nonstationary Environments with Extremely Many Arms [80.05851139852311]
We propose a novel policy to learn reward environments over a continuous space using Gaussian.<n>We show that our method efficiently learns continuous Lipschitz reward functions with $mathcalO*(sqrtT)$ cumulative regret. Furthermore, our method naturally extends to non-stationary problems with a simple modification.
arXiv Detail & Related papers (2025-05-30T15:15:18Z)
Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting [67.1631453378926]
Graph-Triggered Bandits is a framework to generalize rested and restless bandits. In this work, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.
arXiv Detail & Related papers (2024-09-09T18:23:07Z)
Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents [52.75161794035767]
We introduce a class of bandit algorithms that meet the two objectives of performance incentivization and robustness simultaneously. We show that settings where the principal has no information about the arms' performance characteristics can be handled by combining ideas from second price auctions with our algorithms.
arXiv Detail & Related papers (2023-12-13T06:54:49Z)
Reproducible Bandits [95.8830340560603]
A policy in the bandit environment is called reproducible if it pulls, with high probability, the exact same sequence of arms in two different executions. We show that not only do reproducible policies exist, but also they achieve almost the same optimal (non-reproducible) regret bounds in terms of the time horizon. Our results show that even though randomization is crucial for the exploration-exploitation trade-off, an optimal balance can still be achieved while pulling the exact same arms in two different rounds of executions.
arXiv Detail & Related papers (2022-10-04T20:36:45Z)
Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts [53.579515853222986]
We study a novel bandit setting, namely Multi-Armed Bandit with Temporally-Partitioned Rewards (TP-MAB) This setting is a natural extension of delayed-feedback bandits to the case in which rewards may be dilated over a finite-time span after the pull. We provide two algorithms to address TP-MAB problems, namely, TP-UCB-FR and TP-UCB-EW.
arXiv Detail & Related papers (2022-06-01T15:56:59Z)
Asymptotically Optimal Bandits under Weighted Information [10.939683083130616]
We study the problem of regret in a multi-armed bandit setup where the agent is allowed to play multiple arms at each round. We propose a Thompson-Sampling-based strategy, that designs the power profile as its posterior belief of each arm being the best arm.
arXiv Detail & Related papers (2021-05-28T21:28:44Z)
Combinatorial Blocking Bandits with Stochastic Delays [33.65025386998747]
Recent work has considered natural variations of the multi-armed bandit problem, where the reward of each arm is a special function of the time passed since its last pulling. In this work, we extend the above model in two directions: (i) We consider the general setting where more than one arms can be played at each round, subject to feasibility constraints. We provide a tight analysis of the approximation of a natural greedy subset that always plays the maximum expected reward feasible among the available (non-blocked) arms. When the arms' expected rewards are unknown, we adapt the above algorithm into a bandit, based on
arXiv Detail & Related papers (2021-05-22T02:46:04Z)
Output-Weighted Sampling for Multi-Armed Bandits with Extreme Payoffs [11.1546439770774]
We present a new type of acquisition functions for online decision making in bandit problems with extreme payoffs. We formulate a novel type of upper confidence bound (UCB) acquisition function that guides exploration towards the bandits that are deemed most relevant.
arXiv Detail & Related papers (2021-02-19T18:36:03Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.