Related papers: Networked Restless Multi-Arm Bandits with Reinforcement Learning

Networked Restless Multi-Arm Bandits with Reinforcement Learning

URL: http://arxiv.org/abs/2512.06274v1
Date: Sat, 06 Dec 2025 03:53:25 GMT
Title: Networked Restless Multi-Arm Bandits with Reinforcement Learning
Authors: Hanmo Zhang, Zenghui Sun, Kai Wang,
Abstract summary: This paper introduces Networked RMAB, a novel framework that integrates the RMAB model with the independent cascade model.<n>We present its computational challenge due to exponentially large action and state spaces.<n>We experimentally verify these results by developing an efficient Q-learning algorithm tailored to the networked setting.
Score: 4.0539039756740785
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Restless Multi-Armed Bandits (RMABs) are a powerful framework for sequential decision-making, widely applied in resource allocation and intervention optimization challenges in public health. However, traditional RMABs assume independence among arms, limiting their ability to account for interactions between individuals that can be common and significant in a real-world environment. This paper introduces Networked RMAB, a novel framework that integrates the RMAB model with the independent cascade model to capture interactions between arms in networked environments. We define the Bellman equation for networked RMAB and present its computational challenge due to exponentially large action and state spaces. To resolve the computational challenge, we establish the submodularity of Bellman equation and apply the hill-climbing algorithm to achieve a $1-\frac{1}{e}$ approximation guarantee in Bellman updates. Lastly, we prove that the approximate Bellman updates are guaranteed to converge by a modified contraction analysis. We experimentally verify these results by developing an efficient Q-learning algorithm tailored to the networked setting. Experimental results on real-world graph data demonstrate that our Q-learning approach outperforms both $k$-step look-ahead and network-blind approaches, highlighting the importance of capturing and leveraging network effects where they exist.

Related papers

Mitigating Reward Hacking in RLHF via Bayesian Non-negative Reward Modeling [49.41422138354821]
We propose a principled reward modeling framework that integrates non-negative factor analysis into the Bradley-Terry preference model.<n>BNRM represents rewards through a sparse, non-negative latent factor generative process.<n>We show that BNRM substantially mitigates reward over-optimization, improves robustness under distribution shifts, and yields more interpretable reward decompositions than strong baselines.
arXiv Detail & Related papers (2026-02-11T08:14:11Z)
Samba+: General and Accurate Salient Object Detection via A More Unified Mamba-based Framework [66.2103745798444]
Saliency Mamba (Samba) is a pure Mamba-based architecture that flexibly handles various distinct salient object detection tasks.<n>Samba individually outperforms existing methods across six SOD tasks on 22 datasets with lower computational cost.<n>Samba+ achieves even superior results on these tasks and datasets by using a single trained versatile model.
arXiv Detail & Related papers (2026-02-02T03:34:25Z)
Deep Learning and Elicitability for McKean-Vlasov FBSDEs With Common Noise [2.421459418045937]
We present a novel numerical method for solving McKean-Vlasov forward-backward differential equations (MV-FBSDEs) with common noise.<n>The key innovation involves elicitability to derive a path-wise loss function, enabling efficient training of neural networks to approximate both the backward process and the conditional expectations arising from common noise.<n>We validate the algorithm on a systemic risk inter-bank borrowing and lending model, where analytical solutions exist, demonstrating accurate recovery of the true solution.
arXiv Detail & Related papers (2025-12-16T23:39:31Z)
Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space [72.52365911990935]
We introduce Bellman Diffusion, a novel DGM framework that maintains linearity in MDPs through gradient and scalar field modeling. Our results show that Bellman Diffusion achieves accurate field estimations and is a capable image generator, converging 1.5x faster than the traditional histogram-based baseline in distributional RL tasks.
arXiv Detail & Related papers (2024-10-02T17:53:23Z)
A Federated Online Restless Bandit Framework for Cooperative Resource Allocation [23.698976872351576]
We study the cooperative resource allocation problem with unknown system dynamics of MRPs. We put forth a Federated Thompson-enabled Whittle Index (FedTSWI) algorithm to solve this multi-agent online RMAB problem. Numerical results show that the proposed algorithm achieves a fast convergence rate of $mathcalO(sqrtTlog(T))$ and better performance compared with baselines.
arXiv Detail & Related papers (2024-06-12T08:34:53Z)
LoRA-Ensemble: Efficient Uncertainty Modelling for Self-Attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient ensembling method for self-attention networks.<n>The method not only outperforms state-of-the-art implicit techniques like BatchEnsemble, but even matches or exceeds the accuracy of an Explicit Ensemble.
arXiv Detail & Related papers (2024-05-23T11:10:32Z)
Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning [22.287106840756483]
We show how off-policy learning techniques based on return-conditioned supervised learning (RCSL) are able to circumvent challenges of Bellman completeness. We propose a simple framework called MBRCSL, granting RCSL methods the ability of dynamic programming to stitch together segments from distinct trajectories.
arXiv Detail & Related papers (2023-10-30T07:03:14Z)
Networked Restless Multi-Armed Bandits for Mobile Interventions [41.74987432512137]
We study restless multi-armed bandits (RMABs) with network effects. In our model, arms are partially recharging and connected through a graph, so that pulling one arm also improves the state of neighboring arms. We show that network effects in RMABs induce strong reward coupling that is not accounted for by existing solution methods.
arXiv Detail & Related papers (2022-01-28T20:38:01Z)
Low-Latency Federated Learning over Wireless Channels with Differential Privacy [142.5983499872664]
In federated learning (FL), model training is distributed over clients and local models are aggregated by a central server. In this paper, we aim to minimize FL training delay over wireless channels, constrained by overall training performance as well as each client's differential privacy (DP) requirement.
arXiv Detail & Related papers (2021-06-20T13:51:18Z)
Bayesian Bellman Operators [55.959376449737405]
We introduce a novel perspective on Bayesian reinforcement learning (RL) Our framework is motivated by the insight that when bootstrapping is introduced, model-free approaches actually infer a posterior over Bellman operators, not value functions.
arXiv Detail & Related papers (2021-06-09T12:20:46Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.