Related papers: Global Rewards in Restless Multi-Armed Bandits

Global Rewards in Restless Multi-Armed Bandits

URL: http://arxiv.org/abs/2406.00738v2
Date: Fri, 7 Jun 2024 20:38:51 GMT
Title: Global Rewards in Restless Multi-Armed Bandits
Authors: Naveen Raman, Zheyuan Ryan Shi, Fei Fang,
Abstract summary: Restless multi-armed bandits (RMAB) extend multi-armed bandits so pulling an arm impacts future states. Despite the success of RMABs, a key limiting assumption is the separability of rewards into a sum across arms. We propose restless-multi-armed bandit with global rewards (RMAB-G), a generalization of RMABs to global non-separable rewards.
Score: 37.918982196934216
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Restless multi-armed bandits (RMAB) extend multi-armed bandits so pulling an arm impacts future states. Despite the success of RMABs, a key limiting assumption is the separability of rewards into a sum across arms. We address this deficiency by proposing restless-multi-armed bandit with global rewards (RMAB-G), a generalization of RMABs to global non-separable rewards. To solve RMAB-G, we develop the Linear- and Shapley-Whittle indices, which extend Whittle indices from RMABs to RMAB-Gs. We prove approximation bounds but also point out how these indices could fail when reward functions are highly non-linear. To overcome this, we propose two sets of adaptive policies: the first computes indices iteratively, and the second combines indices with Monte-Carlo Tree Search (MCTS). Empirically, we demonstrate that our proposed policies outperform baselines and index-based policies with synthetic data and real-world data from food rescue.

Related papers

Semi-Parametric Batched Global Multi-Armed Bandits with Covariates [0.48342038441006807]
The multi-armed bandits (MAB) framework is a widely used approach for sequential decision-making. We propose a novel semi-parametric framework for batched bandits with coparametrics and a shared parameter across arms. Our algorithm, Batched single-Index Dynamic binning and Successive arm elimination (BIDS), employs a batched successive arm elimination strategy.
arXiv Detail & Related papers (2025-03-01T17:23:55Z)
Continuous K-Max Bandits [54.21533414838677]
We study the $K$-Max multi-armed bandits problem with continuous outcome distributions and weak value-index feedback. This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc. Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds.
arXiv Detail & Related papers (2025-02-19T06:37:37Z)
Finite-Horizon Single-Pull Restless Bandits: An Efficient Index Policy For Scarce Resource Allocation [33.11114874824768]
We introduce Finite-Horizon Single-Pull RMABs, a novel variant in which each arm can only be pulled once. We propose using dummy states to duplicate the system, ensuring that once an arm is activated, it transitions exclusively within the dummy states. For the first time, we demonstrate that our index policy achieves a sub-linearly decaying average optimality gap of $tildemathcalOleft(frac1rho1/2right)$ for a finite number of arms.
arXiv Detail & Related papers (2025-01-10T16:54:56Z)
Stochastic Bandits for Egalitarian Assignment [58.33714486693828]
We study EgalMAB, an egalitarian assignment problem in the context of multi-armed bandits. We design and analyze a UCB-based policy EgalUCB and establish upper bounds on the cumulative regret.
arXiv Detail & Related papers (2024-10-08T09:49:47Z)
Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting [67.1631453378926]
Graph-Triggered Bandits is a framework to generalize rested and restless bandits. In this work, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.
arXiv Detail & Related papers (2024-09-09T18:23:07Z)
GINO-Q: Learning an Asymptotically Optimal Index Policy for Restless Multi-armed Bandits [16.054685587034836]
GINO-Q is a three-timescale approximation algorithm designed to learn an optimal index policy for restless multi-armed bandit (RMAB) GINO-Q does not require RMABs to be indexable, enhancing its flexibility and applicability. Our experimental results demonstrate that GINO-Q consistently learns near optimal policies, even for non-indexable RMABs.
arXiv Detail & Related papers (2024-08-19T10:50:45Z)
Fairness of Exposure in Online Restless Multi-armed Bandits [8.071147275221973]
Restless multi-armed bandits (RMABs) generalize the multi-armed bandits where each arm exhibits Markovian behavior and transitions according to their transition dynamics. We show that our algorithm achieves sublinear fairness regret in the single pull case $O(sqrtTln T)$, with $T$ being the total number of episodes. Empirically, we show that our algorithm performs well in the multi-pull scenario as well.
arXiv Detail & Related papers (2024-02-09T11:53:27Z)
Online Restless Multi-Armed Bandits with Long-Term Fairness Constraints [17.403031677689427]
We introduce a new RMAB model with "long-term fairness constraints" For the online RMAB-F setting, the underlying MDPs associated with each arm are unknown to the DM. We prove that Fair-UCRL ensures probabilistic sublinear bounds on both the reward regret and the fairness violation regret.
arXiv Detail & Related papers (2023-12-16T03:35:56Z)
Limited Resource Allocation in a Non-Markovian World: The Case of Maternal and Child Healthcare [27.812174610119452]
We consider the problem of scheduling interventions in low resource settings to increase adherence and/or engagement. Past works have successfully developed several classes of Restless Multi-armed Bandit (RMAB) based solutions for this problem. We demonstrate significant deviations from the Markov assumption on real-world data on a maternal health awareness program from our partner NGO, ARMMAN. To tackle the generalised non-Markovian RMAB setting we (i) model each participant's trajectory as a time-series, (ii) leverage the power of time-series forecasting models to predict future states, and (iii) propose the Time
arXiv Detail & Related papers (2023-05-22T02:26:29Z)
Contextual bandits with concave rewards, and an application to fair ranking [108.48223948875685]
We present the first algorithm with provably vanishing regret for Contextual Bandits with Concave Rewards (CBCR) We derive a novel reduction from the CBCR regret to the regret of a scalar-reward problem. Motivated by fairness in recommendation, we describe a special case of CBCR with rankings and fairness-aware objectives.
arXiv Detail & Related papers (2022-10-18T16:11:55Z)
Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts [53.579515853222986]
We study a novel bandit setting, namely Multi-Armed Bandit with Temporally-Partitioned Rewards (TP-MAB) This setting is a natural extension of delayed-feedback bandits to the case in which rewards may be dilated over a finite-time span after the pull. We provide two algorithms to address TP-MAB problems, namely, TP-UCB-FR and TP-UCB-EW.
arXiv Detail & Related papers (2022-06-01T15:56:59Z)
DORB: Dynamically Optimizing Multiple Rewards with Bandits [101.68525259222164]
Policy-based reinforcement learning has proven to be a promising approach for optimizing non-differentiable evaluation metrics for language generation tasks. We use the Exp3 algorithm for bandits and formulate two approaches for bandit rewards: (1) Single Multi-reward Bandit (SM-Bandit); (2) Hierarchical Multi-reward Bandit (HM-Bandit) We empirically show the effectiveness of our approaches via various automatic metrics and human evaluation on two important NLG tasks.
arXiv Detail & Related papers (2020-11-15T21:57:47Z)
Simulation Based Algorithms for Markov Decision Processes and Multi-Action Restless Bandits [0.0]
We consider a restless multi-armed bandit (RMAB) with multi-dimensional state space and multi-actions bandit model. We first analyze a standard indexable RMAB (two-action model) and discuss an index based policy approach. We present approximate index algorithm using Monte-Carlo rollout policy.
arXiv Detail & Related papers (2020-07-25T13:50:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.