Achieving the Pareto Frontier of Regret Minimization and Best Arm
Identification in Multi-Armed Bandits
- URL: http://arxiv.org/abs/2110.08627v3
- Date: Fri, 9 Jun 2023 05:14:58 GMT
- Title: Achieving the Pareto Frontier of Regret Minimization and Best Arm
Identification in Multi-Armed Bandits
- Authors: Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan
- Abstract summary: We design and analyze the BoBW-lil'UCB$(gamma)$ algorithm.
We show that (i) no algorithm can simultaneously perform optimally for both the RM and BAI objectives.
We also show that BoBW-lil'UCB$(gamma)$ outperforms a competitor in terms of the time complexity and the regret.
- Score: 91.8283876874947
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the Pareto frontier of two archetypal objectives in multi-armed
bandits, namely, regret minimization (RM) and best arm identification (BAI)
with a fixed horizon. It is folklore that the balance between exploitation and
exploration is crucial for both RM and BAI, but exploration is more critical in
achieving the optimal performance for the latter objective. To this end, we
design and analyze the BoBW-lil'UCB$(\gamma)$ algorithm. Complementarily, by
establishing lower bounds on the regret achievable by any algorithm with a
given BAI failure probability, we show that (i) no algorithm can simultaneously
perform optimally for both the RM and BAI objectives, and (ii)
BoBW-lil'UCB$(\gamma)$ achieves order-wise optimal performance for RM or BAI
under different values of $\gamma$. Our work elucidates the trade-off more
precisely by showing how the constants in previous works depend on certain
hardness parameters. Finally, we show that BoBW-lil'UCB outperforms a close
competitor UCB$_\alpha$ (Degenne et al., 2019) in terms of the time complexity
and the regret on diverse datasets such as MovieLens and Published Kinase
Inhibitor Set.
Related papers
- Best Arm Identification with Minimal Regret [55.831935724659175]
Best arm identification problem elegantly amalgamates regret minimization and BAI.
Agent's goal is to identify the best arm with a prescribed confidence level.
Double KL-UCB algorithm achieves optimality as the confidence level tends to zero.
arXiv Detail & Related papers (2024-09-27T16:46:02Z) - UCB Exploration for Fixed-Budget Bayesian Best Arm Identification [0.0]
We study best-arm identification (BAI) in the fixed-budget setting.
We propose an UCB exploration algorithm that is both theoretically and empirically efficient for the fixed budget BAI problem under a Bayesian setting.
arXiv Detail & Related papers (2024-08-09T05:15:36Z) - Fast and Regret Optimal Best Arm Identification: Fundamental Limits and Low-Complexity Algorithms [15.038210624870656]
Multi-Armed Bandit (MAB) problem with dual objectives: (i) quick identification and commitment to the optimal arm, and (ii) reward throughout a sequence of $T$ consecutive rounds.
This paper introduces emphRegret Best Arm Identification (ROBAI) which aims to achieve these dual objectives.
arXiv Detail & Related papers (2023-09-01T17:12:43Z) - Multi-Fidelity Multi-Armed Bandits Revisited [46.19926456682379]
We study the multi-fidelity multi-armed bandit (MF-MAB), an extension of the canonical multi-armed bandit (MAB) problem.
MF-MAB allows each arm to be pulled with different costs (fidelities) and observation accuracy.
arXiv Detail & Related papers (2023-06-13T13:19:20Z) - Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms [59.8188496313214]
We study the semi-bandits (CMAB) and focus on reducing the dependency of the batch-size $K$ in the regret bound.
First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we propose a BCUCB-T algorithm with variance-aware confidence intervals.
Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition.
arXiv Detail & Related papers (2022-08-31T13:09:39Z) - Adversarially Robust Multi-Armed Bandit Algorithm with
Variance-Dependent Regret Bounds [34.37963000493442]
This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BOBW) algorithm that works nearly optimally in both adversarial settings.
arXiv Detail & Related papers (2022-06-14T12:58:46Z) - Best Arm Identification under Additive Transfer Bandits [49.69203462561861]
We study the setting when, despite the means being unknown, there is a known additive relationship between the source and target MAB instances.
We propose and theoretically analyze an LUCB-style algorithm to identify an $epsilon$-optimal target arm with high probability.
arXiv Detail & Related papers (2021-12-08T02:20:18Z) - Restless-UCB, an Efficient and Low-complexity Algorithm for Online
Restless Bandits [61.490254407420906]
We study the online restless bandit problem, where the state of each arm evolves according to a Markov chain.
We propose Restless-UCB, a learning policy that follows the explore-then-commit framework.
arXiv Detail & Related papers (2020-11-05T05:16:04Z) - Lenient Regret for Multi-Armed Bandits [72.56064196252498]
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took.
While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results.
We suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $epsilon$.
arXiv Detail & Related papers (2020-08-10T08:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.