Restless Multi-Armed Bandits under Exogenous Global Markov Process
- URL: http://arxiv.org/abs/2202.13665v1
- Date: Mon, 28 Feb 2022 10:29:42 GMT
- Title: Restless Multi-Armed Bandits under Exogenous Global Markov Process
- Authors: Tomer Gafni, Michal Yemini, Kobi Cohen
- Abstract summary: We consider an extension to the restless multi-armed bandit (RMAB) problem with unknown arm dynamics.
Under each global state, the rewards process of each arm evolves according to an unknown Markovian rule.
We develop the Learning under Exogenous Markov Process (LEMP) algorithm, that achieves a logarithmic regret order with time.
- Score: 20.58296570065978
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider an extension to the restless multi-armed bandit (RMAB) problem
with unknown arm dynamics, where an unknown exogenous global Markov process
governs the rewards distribution of each arm. Under each global state, the
rewards process of each arm evolves according to an unknown Markovian rule,
which is non-identical among different arms. At each time, a player chooses an
arm out of N arms to play, and receives a random reward from a finite set of
reward states. The arms are restless, that is, their local state evolves
regardless of the player's actions. The objective is an arm-selection policy
that minimizes the regret, defined as the reward loss with respect to a player
that knows the dynamics of the problem, and plays at each time t the arm that
maximizes the expected immediate value. We develop the Learning under Exogenous
Markov Process (LEMP) algorithm, that achieves a logarithmic regret order with
time, and a finite-sample bound on the regret is established. Simulation
results support the theoretical study and demonstrate strong performances of
LEMP.
Related papers
- A General Framework for Clustering and Distribution Matching with Bandit Feedback [81.50716021326194]
We develop a general framework for clustering and distribution matching problems with bandit feedback.
We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $delta$.
arXiv Detail & Related papers (2024-09-08T12:19:12Z) - Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players.
The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile.
We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z) - Optimal Best Arm Identification with Fixed Confidence in Restless Bandits [66.700654953613]
We study best arm identification in a restless multi-armed bandit setting with finitely many arms.
The discrete-time data generated by each arm forms a homogeneous Markov chain taking values in a common, finite state space.
It is demonstrated that tracking the long-term behavior of a certain Markov decision process and its state-action visitation proportions are the key ingredients in analyzing the converse and achievability bounds.
arXiv Detail & Related papers (2023-10-20T10:04:05Z) - Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits [6.732901486505047]
Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by applications to cognitive radio systems.
This paper proposes a textitmulti-player multi-armed walking bandits model, aiming to address aforementioned modeling issues.
arXiv Detail & Related papers (2022-12-12T23:26:02Z) - Information-Gathering in Latent Bandits [79.6953033727455]
We present a method for information-gathering in latent bandits.
We show that choosing the best arm given the agent's beliefs about the states incurs higher regret.
We also show that by choosing arms carefully, we obtain an improved estimation of the state distribution.
arXiv Detail & Related papers (2022-07-08T01:15:12Z) - Best Arm Identification in Restless Markov Multi-Armed Bandits [85.55466536537293]
We study the problem of identifying the best arm in a multi-armed bandit environment.
A decision entity wishes to find the index of the best arm as quickly as possible, subject to an upper bound error probability.
We show that this policy achieves an upper bound that depends on $R$ and is monotonically non-increasing as $Rtoinfty$.
arXiv Detail & Related papers (2022-03-29T04:58:04Z) - Learning in Restless Bandits under Exogenous Global Markov Process [13.836565669337057]
We consider an extension to the restless multi-armed bandit (RMAB) problem with unknown arm dynamics.
Under each global state, the rewards process of each arm evolves according to an unknown Markovian rule.
We develop the Learning under Exogenous Markov Process (LEMP) algorithm to minimize regret.
arXiv Detail & Related papers (2021-12-17T12:47:30Z) - Detecting an Odd Restless Markov Arm with a Trembling Hand [18.122816058828906]
We consider a multi-armed bandit in which each arm is a Markov process evolving on a finite state space.
The transition probability matrix of one of the arms (the odd arm) is different from the common transition probability matrix of all other arms.
A decision maker wishes to identify the odd arm as quickly as possible, while keeping the probability of decision error small.
arXiv Detail & Related papers (2020-05-13T11:27:14Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.