Suboptimal Performance of the Bayes Optimal Algorithm in Frequentist Best Arm Identification
- URL: http://arxiv.org/abs/2202.05193v3
- Date: Mon, 15 Apr 2024 02:46:34 GMT
- Title: Suboptimal Performance of the Bayes Optimal Algorithm in Frequentist Best Arm Identification
- Authors: Junpei Komiyama,
- Abstract summary: We consider the fixed-budget best arm identification problem with rewards following normal distributions.
In this problem, the forecaster is given $K$ arms (or treatments) and $T$ time steps.
The algorithm's performance is evaluated by simple regret, reflecting the quality of the estimated best arm.
- Score: 5.176134438571082
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the fixed-budget best arm identification problem with rewards following normal distributions. In this problem, the forecaster is given $K$ arms (or treatments) and $T$ time steps. The forecaster attempts to find the arm with the largest mean, via an adaptive experiment conducted using an algorithm. The algorithm's performance is evaluated by simple regret, reflecting the quality of the estimated best arm. While frequentist simple regret can decrease exponentially with respect to $T$, Bayesian simple regret decreases polynomially. This paper demonstrates that the Bayes optimal algorithm, which minimizes the Bayesian simple regret, does not yield an exponential decrease in simple regret under certain parameter settings. This contrasts with the numerous findings that suggest the asymptotic equivalence of Bayesian and frequentist approaches in fixed sampling regimes. Although the Bayes optimal algorithm is formulated as a recursive equation that is virtually impossible to compute exactly, we lay the groundwork for future research by introducing a novel concept termed the expected Bellman improvement.
Related papers
- Near-Optimal Algorithm for Non-Stationary Kernelized Bandits [6.379833644595456]
We study a non-stationary kernelized bandit (KB) problem, also called time-varying Bayesian optimization.
We show the first algorithm-independent regret lower bound for non-stationary KB with squared exponential and Mat'ern kernels.
We propose a novel near-optimal algorithm called restarting phased elimination with random permutation.
arXiv Detail & Related papers (2024-10-21T14:28:26Z) - Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling [44.921905700729766]
Most bandit algorithms assume that the reward variances or their upper bounds are known, and that they are the same for all arms.
This motivated prior works on variance-adaptive frequentist algorithms, which have strong instance-dependent regret bounds.
We lay foundations for the Bayesian setting, which incorporates prior knowledge.
arXiv Detail & Related papers (2023-03-16T02:07:29Z) - Rate-optimal Bayesian Simple Regret in Best Arm Identification [11.389780431092914]
We consider best arm identification in the multi-armed bandit problem.
We propose a simple and easy-to-compute algorithm with its leading term matching with the lower bound up to a constant factor.
arXiv Detail & Related papers (2021-11-18T18:59:35Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Lenient Regret for Multi-Armed Bandits [72.56064196252498]
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took.
While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results.
We suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $epsilon$.
arXiv Detail & Related papers (2020-08-10T08:30:52Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Bandit algorithms to emulate human decision making using probabilistic
distortions [20.422725678982726]
We formulate two sample multi-armed bandit problems with distorted probabilities on the reward distributions.
We consider the aforementioned problems in the regret minimization as well as best arm identification framework for multi-armed bandits.
arXiv Detail & Related papers (2016-11-30T17:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.