Indexability of Finite State Restless Multi-Armed Bandit and Rollout
Policy
- URL: http://arxiv.org/abs/2305.00410v1
- Date: Sun, 30 Apr 2023 06:53:44 GMT
- Title: Indexability of Finite State Restless Multi-Armed Bandit and Rollout
Policy
- Authors: Vishesh Mittal, Rahul Meshram, Deepak Dev and Surya Prakash
- Abstract summary: We consider finite state restless multi-armed bandit problem.
The classical approach to restless bandits is Whittle index policy.
We propose an alternative approach to verify the indexable criteria for a single armed bandit model.
- Score: 5.64327489637232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider finite state restless multi-armed bandit problem. The decision
maker can act on M bandits out of N bandits in each time step. The play of arm
(active arm) yields state dependent rewards based on action and when the arm is
not played, it also provides rewards based on the state and action. The
objective of the decision maker is to maximize the infinite horizon discounted
reward. The classical approach to restless bandits is Whittle index policy. In
such policy, the M arms with highest indices are played at each time step.
Here, one decouples the restless bandits problem by analyzing relaxed
constrained restless bandits problem. Then by Lagrangian relaxation problem,
one decouples restless bandits problem into N single-armed restless bandit
problems. We analyze the single-armed restless bandit. In order to study the
Whittle index policy, we show structural results on the single armed bandit
model. We define indexability and show indexability in special cases. We
propose an alternative approach to verify the indexable criteria for a single
armed bandit model using value iteration algorithm. We demonstrate the
performance of our algorithm with different examples. We provide insight on
condition of indexability of restless bandits using different structural
assumptions on transition probability and reward matrices. We also study online
rollout policy and discuss the computation complexity of algorithm and compare
that with complexity of index computation. Numerical examples illustrate that
index policy and rollout policy performs better than myopic policy.
Related papers
- Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting [67.1631453378926]
Graph-Triggered Bandits is a framework to generalize rested and restless bandits.
In this work, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.
arXiv Detail & Related papers (2024-09-09T18:23:07Z) - Thompson Exploration with Best Challenger Rule in Best Arm
Identification [66.33448474838342]
We study the fixed-confidence best arm identification problem in the bandit framework.
We propose a novel policy that combines Thompson sampling with a computationally efficient approach known as the best challenger rule.
arXiv Detail & Related papers (2023-10-01T01:37:02Z) - Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear
Bandit Algorithms [39.70492757288025]
We address the contextual linear bandit problem, where a decision maker is provided a context.
We show that the contextual problem can be solved as a linear bandit problem.
Our results imply a $O(dsqrtTlog T)$ high-probability regret bound for contextual linear bandits.
arXiv Detail & Related papers (2022-11-08T22:18:53Z) - Reinforcement Learning for Finite-Horizon Restless Multi-Armed
Multi-Action Bandits [8.136957953239254]
We study a finite-horizon restless multi-armed bandit problem with multiple actions dubbed R(MA)2B.
The state of each arm evolves according to a controlled Markov decision process (MDP), and the reward of pulling an arm depends on both the current state of the corresponding MDP and the action taken.
Since finding the optimal policy is typically intractable, we propose a computationally appealing index policy which we call Occupancy-Measured-Reward Index Policy.
arXiv Detail & Related papers (2021-09-20T21:40:12Z) - Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms.
Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z) - Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous
Feedback [32.62857394584907]
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback.
We propose adaptive algorithms for both the adversarial and non- adversarial cases.
arXiv Detail & Related papers (2020-12-13T12:25:41Z) - A Novel Confidence-Based Algorithm for Structured Bandits [129.30402124516507]
We study finite-armed bandits where the rewards of each arm might be correlated to those of other arms.
We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem.
arXiv Detail & Related papers (2020-05-23T19:52:44Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z) - The Price of Incentivizing Exploration: A Characterization via Thompson
Sampling and Sample Complexity [83.81297078039836]
We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents.
We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility.
arXiv Detail & Related papers (2020-02-03T04:58:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.