Learning to Detect an Odd Restless Markov Arm with a Trembling Hand
- URL: http://arxiv.org/abs/2105.03603v1
- Date: Sat, 8 May 2021 05:53:12 GMT
- Title: Learning to Detect an Odd Restless Markov Arm with a Trembling Hand
- Authors: P. N. Karthik and Rajesh Sundaresan
- Abstract summary: anomaly means that the transition probability matrix of one of the arms is different from the common TPM of each of the non-odd arms.
We devise a policy based on the principle of certainty equivalence, and demonstrate that under a continuous selection assumption and a certain regularity assumption on the TPMs, the policy achieves the lower bound arbitrarily closely.
Our achievability analysis is based on resolving the identifiability problem in the context of a certain countable-state controlled Markov process.
- Score: 12.467685221424032
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper studies the problem of finding an anomalous arm in a multi-armed
bandit when (a) each arm is a finite-state Markov process, and (b) the arms are
restless. Here, anomaly means that the transition probability matrix (TPM) of
one of the arms (the odd arm) is different from the common TPM of each of the
non-odd arms. The TPMs are unknown to a decision entity that wishes to find the
index of the odd arm as quickly as possible, subject to an upper bound on the
error probability. We derive a problem instance specific asymptotic lower bound
on the expected time required to find the odd arm index, where the asymptotics
is as the error probability vanishes. Further, we devise a policy based on the
principle of certainty equivalence, and demonstrate that under a continuous
selection assumption and a certain regularity assumption on the TPMs, the
policy achieves the lower bound arbitrarily closely. Thus, while the lower
bound is shown for all problem instances, the upper bound is shown only for
those problem instances satisfying the regularity assumption. Our achievability
analysis is based on resolving the identifiability problem in the context of a
certain countable-state controlled Markov process.
Related papers
- Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Optimal Best Arm Identification with Fixed Confidence in Restless Bandits [66.700654953613]
We study best arm identification in a restless multi-armed bandit setting with finitely many arms.
The discrete-time data generated by each arm forms a homogeneous Markov chain taking values in a common, finite state space.
It is demonstrated that tracking the long-term behavior of a certain Markov decision process and its state-action visitation proportions are the key ingredients in analyzing the converse and achievability bounds.
arXiv Detail & Related papers (2023-10-20T10:04:05Z) - Federated Best Arm Identification with Heterogeneous Clients [62.36929749450298]
We study best arm identification in a federated multi-armed bandit setting with a central server and multiple clients.
We show that for any algorithm whose upper bound on the expected stopping time matches with the lower bound up to a multiplicative constant (em almost-optimal algorithm)
We propose a novel algorithm that communicates at exponential time instants, and demonstrate that it is almost-optimal.
arXiv Detail & Related papers (2022-10-14T13:09:11Z) - Best Arm Identification in Restless Markov Multi-Armed Bandits [85.55466536537293]
We study the problem of identifying the best arm in a multi-armed bandit environment.
A decision entity wishes to find the index of the best arm as quickly as possible, subject to an upper bound error probability.
We show that this policy achieves an upper bound that depends on $R$ and is monotonically non-increasing as $Rtoinfty$.
arXiv Detail & Related papers (2022-03-29T04:58:04Z) - Problem Dependent View on Structured Thresholding Bandit Problems [73.70176003598449]
We investigate the problem dependent regime in the Thresholding Bandit problem (TBP)
The objective of the learner is to output, at the end of a sequential game, the set of arms whose means are above a given threshold.
We provide upper and lower bounds for the probability of error in both the concave and monotone settings.
arXiv Detail & Related papers (2021-06-18T15:01:01Z) - From Finite to Countable-Armed Bandits [8.099977107670918]
We consider a bandit problem with countably many arms that belong to a finite set of types.
There is a fixed distribution over types which sets the proportion of each type in the population of arms.
We propose a fully adaptive online learning algorithm that achieves O(log n) distribution-dependent expected cumulative regret after any number of plays n.
arXiv Detail & Related papers (2021-05-22T13:09:50Z) - Online Model Selection: a Rested Bandit Formulation [49.69377391589057]
We introduce and analyze a best arm identification problem in the rested bandit setting.
We define a novel notion of regret for this problem, where we compare to the policy that always plays the arm having the smallest expected loss at the end of the game.
Unlike known model selection efforts in the recent bandit literature, our algorithm exploits the specific structure of the problem to learn the unknown parameters of the expected loss function.
arXiv Detail & Related papers (2020-12-07T08:23:08Z) - Detecting an Odd Restless Markov Arm with a Trembling Hand [18.122816058828906]
We consider a multi-armed bandit in which each arm is a Markov process evolving on a finite state space.
The transition probability matrix of one of the arms (the odd arm) is different from the common transition probability matrix of all other arms.
A decision maker wishes to identify the odd arm as quickly as possible, while keeping the probability of decision error small.
arXiv Detail & Related papers (2020-05-13T11:27:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.