Related papers: Indexability of Finite State Restless Multi-Armed Bandit and Rollout Policy

Indexability of Finite State Restless Multi-Armed Bandit and Rollout Policy

URL: http://arxiv.org/abs/2305.00410v1
Date: Sun, 30 Apr 2023 06:53:44 GMT
Title: Indexability of Finite State Restless Multi-Armed Bandit and Rollout Policy
Authors: Vishesh Mittal, Rahul Meshram, Deepak Dev and Surya Prakash
Abstract summary: We consider finite state restless multi-armed bandit problem. The classical approach to restless bandits is Whittle index policy. We propose an alternative approach to verify the indexable criteria for a single armed bandit model.
Score: 5.64327489637232
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider finite state restless multi-armed bandit problem. The decision maker can act on M bandits out of N bandits in each time step. The play of arm (active arm) yields state dependent rewards based on action and when the arm is not played, it also provides rewards based on the state and action. The objective of the decision maker is to maximize the infinite horizon discounted reward. The classical approach to restless bandits is Whittle index policy. In such policy, the M arms with highest indices are played at each time step. Here, one decouples the restless bandits problem by analyzing relaxed constrained restless bandits problem. Then by Lagrangian relaxation problem, one decouples restless bandits problem into N single-armed restless bandit problems. We analyze the single-armed restless bandit. In order to study the Whittle index policy, we show structural results on the single armed bandit model. We define indexability and show indexability in special cases. We propose an alternative approach to verify the indexable criteria for a single armed bandit model using value iteration algorithm. We demonstrate the performance of our algorithm with different examples. We provide insight on condition of indexability of restless bandits using different structural assumptions on transition probability and reward matrices. We also study online rollout policy and discuss the computation complexity of algorithm and compare that with complexity of index computation. Numerical examples illustrate that index policy and rollout policy performs better than myopic policy.

Related papers

The Minimal Search Space for Conditional Causal Bandits [0.18124328823188351]
Causal knowledge can be used to support decision-making problems. This paper presents a graphical characterization of the minimal set of nodes guaranteed to contain the optimal conditional intervention. We then propose an efficient algorithm with a time complexity of $O(|V| + |E|)$ to identify this minimal set of nodes.
arXiv Detail & Related papers (2025-02-10T15:45:18Z)
Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting [67.1631453378926]
Graph-Triggered Bandits is a framework to generalize rested and restless bandits. In this work, we focus on two specific types of monotonic bandits: rising, where the expected reward of an arm grows as the number of triggers increases, and rotting, where the opposite behavior occurs.
arXiv Detail & Related papers (2024-09-09T18:23:07Z)
Thompson Exploration with Best Challenger Rule in Best Arm Identification [66.33448474838342]
We study the fixed-confidence best arm identification problem in the bandit framework. We propose a novel policy that combines Thompson sampling with a computationally efficient approach known as the best challenger rule.
arXiv Detail & Related papers (2023-10-01T01:37:02Z)
Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear Bandit Algorithms [39.70492757288025]
We address the contextual linear bandit problem, where a decision maker is provided a context. We show that the contextual problem can be solved as a linear bandit problem. Our results imply a $O(dsqrtTlog T)$ high-probability regret bound for contextual linear bandits.
arXiv Detail & Related papers (2022-11-08T22:18:53Z)
Reinforcement Learning for Finite-Horizon Restless Multi-Armed Multi-Action Bandits [8.136957953239254]
We study a finite-horizon restless multi-armed bandit problem with multiple actions dubbed R(MA)2B. The state of each arm evolves according to a controlled Markov decision process (MDP), and the reward of pulling an arm depends on both the current state of the corresponding MDP and the action taken. Since finding the optimal policy is typically intractable, we propose a computationally appealing index policy which we call Occupancy-Measured-Reward Index Policy.
arXiv Detail & Related papers (2021-09-20T21:40:12Z)
Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms. Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z)
Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback [32.62857394584907]
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. We propose adaptive algorithms for both the adversarial and non- adversarial cases.
arXiv Detail & Related papers (2020-12-13T12:25:41Z)
A Novel Confidence-Based Algorithm for Structured Bandits [129.30402124516507]
We study finite-armed bandits where the rewards of each arm might be correlated to those of other arms. We introduce a novel phased algorithm that exploits the given structure to build confidence sets over the parameters of the true bandit problem.
arXiv Detail & Related papers (2020-05-23T19:52:44Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity [83.81297078039836]
We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents. We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility.
arXiv Detail & Related papers (2020-02-03T04:58:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.