Related papers: Bayesian Fixed-Budget Best-Arm Identification

Bayesian Fixed-Budget Best-Arm Identification

URL: http://arxiv.org/abs/2211.08572v3
Date: Thu, 15 Jun 2023 13:29:25 GMT
Title: Bayesian Fixed-Budget Best-Arm Identification
Authors: Alexia Atsidakou, Sumeet Katariya, Sujay Sanghavi, Branislav Kveton
Abstract summary: Fixed-budget best-arm identification (BAI) is a bandit problem where the agent maximizes the probability of identifying the optimal arm within a fixed budget of observations. We propose a Bayesian elimination algorithm and derive an upper bound on its probability of misidentifying the optimal arm.
Score: 24.31655036648236
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fixed-budget best-arm identification (BAI) is a bandit problem where the agent maximizes the probability of identifying the optimal arm within a fixed budget of observations. In this work, we study this problem in the Bayesian setting. We propose a Bayesian elimination algorithm and derive an upper bound on its probability of misidentifying the optimal arm. The bound reflects the quality of the prior and is the first distribution-dependent bound in this setting. We prove it using a frequentist-like argument, where we carry the prior through, and then integrate out the bandit instance at the end. We also provide a lower bound on the probability of misidentification in a $2$-armed Bayesian bandit and show that our upper bound (almost) matches it for any budget. Our experiments show that Bayesian elimination is superior to frequentist methods and competitive with the state-of-the-art Bayesian algorithms that have no guarantees in our setting.

Related papers

Continuous K-Max Bandits [54.21533414838677]
We study the $K$-Max multi-armed bandits problem with continuous outcome distributions and weak value-index feedback. This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc. Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds.
arXiv Detail & Related papers (2025-02-19T06:37:37Z)
UCB Exploration for Fixed-Budget Bayesian Best Arm Identification [0.0]
We study best-arm identification (BAI) in the fixed-budget setting. We propose an UCB exploration algorithm that is both theoretically and empirically efficient for the fixed budget BAI problem under a Bayesian setting.
arXiv Detail & Related papers (2024-08-09T05:15:36Z)
Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget [6.22018632187078]
We introduce the constrained best mixed arm identification (CBMAI) problem with a fixed budget. The goal is to find the best mixed arm that maximizes the expected reward subject to constraints on the expected costs with a given learning budget $N$. We provide a theoretical upper bound on the mis-identification (of the the support of the best mixed arm) probability and show that it decays exponentially in the budget $N$.
arXiv Detail & Related papers (2024-05-23T22:35:11Z)
Fixed Confidence Best Arm Identification in the Bayesian Setting [6.083234045523298]
We consider the fixed-confidence best arm identification (FC-BAI) problem in the Bayesian setting. This problem aims to find the arm of the largest mean with a fixed confidence level when the bandit model has been sampled from the known prior.
arXiv Detail & Related papers (2024-02-16T03:36:03Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
Best Arm Identification under Additive Transfer Bandits [49.69203462561861]
We study the setting when, despite the means being unknown, there is a known additive relationship between the source and target MAB instances. We propose and theoretically analyze an LUCB-style algorithm to identify an $epsilon$-optimal target arm with high probability.
arXiv Detail & Related papers (2021-12-08T02:20:18Z)
Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z)
Guaranteed Fixed-Confidence Best Arm Identification in Multi-Armed Bandit [1.5469452301122177]
We consider the problem of finding, through adaptive sampling, which of n populations (arms) has the largest mean. Our objective is to determine a rule which identifies the best population with a fixed minimum confidence using as few observations as possible.
arXiv Detail & Related papers (2021-06-12T20:05:29Z)
Towards Minimax Optimal Best Arm Identification in Linear Bandits [95.22854522340938]
We study the problem of best arm identification in linear bandits in the fixed-budget setting. By leveraging properties of the G-optimal design and incorporating it into the arm allocation rule, we design a parameter-free algorithm. We provide a theoretical analysis of the failure probability of OD-LinBAI.
arXiv Detail & Related papers (2021-05-27T09:19:10Z)
Quantile Bandits for Best Arms Identification [10.294977861990203]
We consider a variant of the best arm identification task in multi-armed bandits. Motivated by risk-averse decision-making problems, our goal is to identify a set of $m$ arms with the highest $tau$-quantile values within a fixed budget.
arXiv Detail & Related papers (2020-10-22T09:58:54Z)
Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds. Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.