Fixed-Budget Best-Arm Identification in Structured Bandits
- URL: http://arxiv.org/abs/2106.04763v8
- Date: Tue, 4 Jul 2023 19:22:17 GMT
- Title: Fixed-Budget Best-Arm Identification in Structured Bandits
- Authors: Mohammad Javad Azizi, Branislav Kveton and Mohammad Ghavamzadeh
- Abstract summary: Best-arm identification (BAI) in a fixed-budget setting is a bandit problem where the learning agent maximizes the probability of identifying the optimal (best) arm after a fixed number of observations.
We propose a general tractable algorithm that incorporates the structure, by eliminating suboptimal arms based on their mean reward estimates from a joint generalization model.
- Score: 33.27743152847947
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Best-arm identification (BAI) in a fixed-budget setting is a bandit problem
where the learning agent maximizes the probability of identifying the optimal
(best) arm after a fixed number of observations. Most works on this topic study
unstructured problems with a small number of arms, which limits their
applicability. We propose a general tractable algorithm that incorporates the
structure, by successively eliminating suboptimal arms based on their mean
reward estimates from a joint generalization model. We analyze our algorithm in
linear and generalized linear models (GLMs), and propose a practical
implementation based on a G-optimal design. In linear models, our algorithm has
competitive error guarantees to prior works and performs at least as well
empirically. In GLMs, this is the first practical algorithm with analysis for
fixed-budget BAI.
Related papers
- Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Online Clustering of Bandits with Misspecified User Models [42.56440072468658]
Contextual linear bandit is an online learning problem where given arm features, a learning agent selects an arm at each round to maximize the cumulative rewards in the long run.
A line of works, called the clustering of bandits (CB), utilize the collaborative effect over user preferences and have shown significant improvements over classic linear bandit algorithms.
In this paper, we are the first to present the important problem of clustering of bandits with misspecified user models (CBMUM).
We devise two robust CB algorithms, RCLUMB and RSCLUMB, that can accommodate the inaccurate user preference estimations and erroneous clustering caused by model misspecifications.
arXiv Detail & Related papers (2023-10-04T10:40:50Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Towards Minimax Optimal Best Arm Identification in Linear Bandits [95.22854522340938]
We study the problem of best arm identification in linear bandits in the fixed-budget setting.
By leveraging properties of the G-optimal design and incorporating it into the arm allocation rule, we design a parameter-free algorithm.
We provide a theoretical analysis of the failure probability of OD-LinBAI.
arXiv Detail & Related papers (2021-05-27T09:19:10Z) - Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit
Feedback [51.21673420940346]
Combinatorial bandits generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set.
We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set.
Based on a projection-free online learning algorithm for finite polytopes, it is the first computationally efficient algorithm which is convexally optimal and has competitive empirical performance.
arXiv Detail & Related papers (2021-01-21T10:35:09Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - An Empirical Process Approach to the Union Bound: Practical Algorithms
for Combinatorial and Linear Bandits [34.06611065493047]
This paper proposes near-optimal algorithms for the pure-exploration linear bandit problem in the fixed confidence and fixed budget settings.
We provide an algorithm whose sample complexity scales with the geometry of the instance and avoids an explicit union bound over the number of arms.
We also propose the first algorithm for linear bandits in the the fixed budget setting.
arXiv Detail & Related papers (2020-06-21T00:56:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.