Rate-optimal Bayesian Simple Regret in Best Arm Identification
- URL: http://arxiv.org/abs/2111.09885v3
- Date: Wed, 26 Jul 2023 00:56:50 GMT
- Title: Rate-optimal Bayesian Simple Regret in Best Arm Identification
- Authors: Junpei Komiyama, Kaito Ariu, Masahiro Kato and Chao Qin
- Abstract summary: We consider best arm identification in the multi-armed bandit problem.
We propose a simple and easy-to-compute algorithm with its leading term matching with the lower bound up to a constant factor.
- Score: 11.389780431092914
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider best arm identification in the multi-armed bandit problem.
Assuming certain continuity conditions of the prior, we characterize the rate
of the Bayesian simple regret. Differing from Bayesian regret minimization
(Lai, 1987), the leading term in the Bayesian simple regret derives from the
region where the gap between optimal and suboptimal arms is smaller than
$\sqrt{\frac{\log T}{T}}$. We propose a simple and easy-to-compute algorithm
with its leading term matching with the lower bound up to a constant factor;
simulation results support our theoretical findings.
Related papers
- Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Thompson Exploration with Best Challenger Rule in Best Arm
Identification [66.33448474838342]
We study the fixed-confidence best arm identification problem in the bandit framework.
We propose a novel policy that combines Thompson sampling with a computationally efficient approach known as the best challenger rule.
arXiv Detail & Related papers (2023-10-01T01:37:02Z) - Bayesian Fixed-Budget Best-Arm Identification [24.31655036648236]
Fixed-budget best-arm identification (BAI) is a bandit problem where the agent maximizes the probability of identifying the optimal arm within a fixed budget of observations.
We propose a Bayesian elimination algorithm and derive an upper bound on its probability of misidentifying the optimal arm.
arXiv Detail & Related papers (2022-11-15T23:29:51Z) - Suboptimal Performance of the Bayes Optimal Algorithm in Frequentist Best Arm Identification [5.176134438571082]
We consider the fixed-budget best arm identification problem with rewards following normal distributions.
In this problem, the forecaster is given $K$ arms (or treatments) and $T$ time steps.
The algorithm's performance is evaluated by simple regret, reflecting the quality of the estimated best arm.
arXiv Detail & Related papers (2022-02-10T17:50:26Z) - Optimal Regret Is Achievable with Bounded Approximate Inference Error:
An Enhanced Bayesian Upper Confidence Bound Framework [22.846260353176614]
We propose an Enhanced Bayesian Upper Confidence Bound (EBUCB) framework that can efficiently accommodate bandit problems.
We show that EBUCB can achieve the optimal regret order $O(log T)$ if the inference error measured by two different $alpha$-divergences is less than a constant.
Our study provides the first theoretical regret bound that is better than $o(T)$ in the setting of constant approximate inference error.
arXiv Detail & Related papers (2022-01-31T01:36:15Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Regret Minimization in Heavy-Tailed Bandits [12.272975892517039]
We revisit the classic regret-minimization problem in the multi-armed bandit setting.
We propose an optimal algorithm that matches the lower bound exactly in the first-order term.
We show that our index concentrates faster than the well known truncated or trimmed empirical mean estimators.
arXiv Detail & Related papers (2021-02-07T07:46:24Z) - Lenient Regret for Multi-Armed Bandits [72.56064196252498]
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took.
While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results.
We suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $epsilon$.
arXiv Detail & Related papers (2020-08-10T08:30:52Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.