Best Arm Identification in Bandits with Limited Precision Sampling
- URL: http://arxiv.org/abs/2305.06082v1
- Date: Wed, 10 May 2023 12:07:48 GMT
- Title: Best Arm Identification in Bandits with Limited Precision Sampling
- Authors: Kota Srinivas Reddy, P. N. Karthik, Nikhil Karamchandani and
Jayakrishnan Nair
- Abstract summary: We study best arm identification in a variant of the multi-armed bandit problem where the learner has limited precision in arm selection.
We propose a modified tracking-based algorithm to handle non-unique optimal allocations.
- Score: 14.011731120150124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study best arm identification in a variant of the multi-armed bandit
problem where the learner has limited precision in arm selection. The learner
can only sample arms via certain exploration bundles, which we refer to as
boxes. In particular, at each sampling epoch, the learner selects a box, which
in turn causes an arm to get pulled as per a box-specific probability
distribution. The pulled arm and its instantaneous reward are revealed to the
learner, whose goal is to find the best arm by minimising the expected stopping
time, subject to an upper bound on the error probability. We present an
asymptotic lower bound on the expected stopping time, which holds as the error
probability vanishes. We show that the optimal allocation suggested by the
lower bound is, in general, non-unique and therefore challenging to track. We
propose a modified tracking-based algorithm to handle non-unique optimal
allocations, and demonstrate that it is asymptotically optimal. We also present
non-asymptotic lower and upper bounds on the stopping time in the simpler
setting when the arms accessible from one box do not overlap with those of
others.
Related papers
- Locally Optimal Fixed-Budget Best Arm Identification in Two-Armed Gaussian Bandits with Unknown Variances [10.470114319701576]
We propose a strategy that estimates variances during an adaptive experiment and draws arms with a ratio of the estimated standard deviations.
Our results suggest that under the worst-case scenario characterized by the small-gap regime, our strategy, which employs estimated variance, is optimalally even when the variances are unknown.
arXiv Detail & Related papers (2023-12-20T03:28:49Z) - Worst-Case Optimal Multi-Armed Gaussian Best Arm Identification with a
Fixed Budget [10.470114319701576]
This study investigates the experimental design problem for identifying the arm with the highest expected outcome.
Under the assumption that the variances are known, we propose the Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) strategy.
We show that the GNA-EBA strategy is infinitelyally optimal in sense that its probability of misidentification aligns with the lower bounds.
arXiv Detail & Related papers (2023-10-30T17:52:46Z) - Optimal Best Arm Identification with Fixed Confidence in Restless Bandits [66.700654953613]
We study best arm identification in a restless multi-armed bandit setting with finitely many arms.
The discrete-time data generated by each arm forms a homogeneous Markov chain taking values in a common, finite state space.
It is demonstrated that tracking the long-term behavior of a certain Markov decision process and its state-action visitation proportions are the key ingredients in analyzing the converse and achievability bounds.
arXiv Detail & Related papers (2023-10-20T10:04:05Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - Beyond the Best: Estimating Distribution Functionals in Infinite-Armed
Bandits [40.71199236098642]
In the infinite-armed bandit problem, each arm's average reward is sampled from an unknown distribution.
We consider a general class of distribution functionals beyond the maximum, and propose unified meta algorithms for both the offline and online settings.
arXiv Detail & Related papers (2022-11-01T18:20:10Z) - Federated Best Arm Identification with Heterogeneous Clients [62.36929749450298]
We study best arm identification in a federated multi-armed bandit setting with a central server and multiple clients.
We show that for any algorithm whose upper bound on the expected stopping time matches with the lower bound up to a multiplicative constant (em almost-optimal algorithm)
We propose a novel algorithm that communicates at exponential time instants, and demonstrate that it is almost-optimal.
arXiv Detail & Related papers (2022-10-14T13:09:11Z) - Best Arm Identification in Restless Markov Multi-Armed Bandits [85.55466536537293]
We study the problem of identifying the best arm in a multi-armed bandit environment.
A decision entity wishes to find the index of the best arm as quickly as possible, subject to an upper bound error probability.
We show that this policy achieves an upper bound that depends on $R$ and is monotonically non-increasing as $Rtoinfty$.
arXiv Detail & Related papers (2022-03-29T04:58:04Z) - Combinatorial Blocking Bandits with Stochastic Delays [33.65025386998747]
Recent work has considered natural variations of the multi-armed bandit problem, where the reward of each arm is a special function of the time passed since its last pulling.
In this work, we extend the above model in two directions: (i) We consider the general setting where more than one arms can be played at each round, subject to feasibility constraints.
We provide a tight analysis of the approximation of a natural greedy subset that always plays the maximum expected reward feasible among the available (non-blocked) arms.
When the arms' expected rewards are unknown, we adapt the above algorithm into a bandit, based on
arXiv Detail & Related papers (2021-05-22T02:46:04Z) - Online Model Selection: a Rested Bandit Formulation [49.69377391589057]
We introduce and analyze a best arm identification problem in the rested bandit setting.
We define a novel notion of regret for this problem, where we compare to the policy that always plays the arm having the smallest expected loss at the end of the game.
Unlike known model selection efforts in the recent bandit literature, our algorithm exploits the specific structure of the problem to learn the unknown parameters of the expected loss function.
arXiv Detail & Related papers (2020-12-07T08:23:08Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - Ballooning Multi-Armed Bandits [12.205797997133397]
We introduce Ballooning Multi-Armed Bandits (BL-MAB)
In BL-MAB, the set of available arms grows (or balloons) over time.
We show that if the best arm is equally likely to arrive at any time instant, a sublinear regret cannot be achieved.
arXiv Detail & Related papers (2020-01-24T04:35:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.