Efficient Algorithms for Finite Horizon and Streaming Restless
Multi-Armed Bandit Problems
- URL: http://arxiv.org/abs/2103.04730v1
- Date: Mon, 8 Mar 2021 13:10:31 GMT
- Title: Efficient Algorithms for Finite Horizon and Streaming Restless
Multi-Armed Bandit Problems
- Authors: Aditya Mate, Arpita Biswas, Christoph Siebenbrunner, Milind Tambe
- Abstract summary: We propose a new and scalable approach to computing index-based solutions.
We provide algorithms designed to capture index decay without having to solve the costly finite horizon problem.
Our algorithms achieve an over 150x speed-up over existing methods in these tasks without loss in performance.
- Score: 30.759279275710078
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Restless Multi-Armed Bandits (RMABs) have been popularly used to model
limited resource allocation problems. Recently, these have been employed for
health monitoring and intervention planning problems. However, the existing
approaches fail to account for the arrival of new patients and the departure of
enrolled patients from a treatment program. To address this challenge, we
formulate a streaming bandit (S-RMAB) framework, a generalization of RMABs
where heterogeneous arms arrive and leave under possibly random streams. We
propose a new and scalable approach to computing index-based solutions. We
start by proving that index values decrease for short residual lifetimes, a
phenomenon that we call index decay. We then provide algorithms designed to
capture index decay without having to solve the costly finite horizon problem,
thereby lowering the computational complexity compared to existing methods.We
evaluate our approach via simulations run on real-world data obtained from a
tuberculosis intervention planning task as well as multiple other synthetic
domains. Our algorithms achieve an over 150x speed-up over existing methods in
these tasks without loss in performance. These findings are robust across
multiple domains.
Related papers
- GINO-Q: Learning an Asymptotically Optimal Index Policy for Restless Multi-armed Bandits [16.054685587034836]
GINO-Q is a three-timescale approximation algorithm designed to learn an optimal index policy for restless multi-armed bandit (RMAB)
GINO-Q does not require RMABs to be indexable, enhancing its flexibility and applicability.
Our experimental results demonstrate that GINO-Q consistently learns near optimal policies, even for non-indexable RMABs.
arXiv Detail & Related papers (2024-08-19T10:50:45Z) - QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction [0.0]
We introduce QN-Mixer, an algorithm based on the quasi-Newton approach.
Incept-Mixer is an efficient neural architecture that serves as a non-local regularization term.
Our approach intelligently downsamples information, significantly reducing computational requirements.
arXiv Detail & Related papers (2024-02-28T00:20:25Z) - Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network.
We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems.
We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Indexability is Not Enough for Whittle: Improved, Near-Optimal
Algorithms for Restless Bandits [30.532795983761314]
We study the problem of planning restless multi-armed bandits (RMABs) with multiple actions.
We first show that Whittle index policies can fail in simple and practically relevant settings.
We then propose an alternate planning algorithm based on the mean-field method.
arXiv Detail & Related papers (2022-10-31T19:35:15Z) - Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via
pT-Learning [2.0625936401496237]
Recent advances in mobile health (mHealth) technology provide an effective way to monitor individuals' health statuses and deliver just-in-time personalized interventions.
The practical use of mHealth technology raises unique challenges to existing methodologies on learning an optimal dynamic treatment regime.
We propose a Proximal Temporal Learning framework to estimate an optimal regime adaptively adjusted between deterministic and sparse policy models.
arXiv Detail & Related papers (2021-10-20T18:38:22Z) - Restless-UCB, an Efficient and Low-complexity Algorithm for Online
Restless Bandits [61.490254407420906]
We study the online restless bandit problem, where the state of each arm evolves according to a Markov chain.
We propose Restless-UCB, a learning policy that follows the explore-then-commit framework.
arXiv Detail & Related papers (2020-11-05T05:16:04Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Differentiable Causal Discovery from Interventional Data [141.41931444927184]
We propose a theoretically-grounded method based on neural networks that can leverage interventional data.
We show that our approach compares favorably to the state of the art in a variety of settings.
arXiv Detail & Related papers (2020-07-03T15:19:17Z) - Structure Adaptive Algorithms for Stochastic Bandits [22.871155520200773]
We study reward maximisation in a class of structured multi-armed bandit problems.
The mean rewards of arms satisfy some given structural constraints.
We develop algorithms from instance-dependent lower-bounds using iterative saddle-point solvers.
arXiv Detail & Related papers (2020-07-02T08:59:54Z) - Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take.
We propose a novel approach based on Lagrangian Decomposition.
We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.