Achieving the Pareto Frontier of Regret Minimization and Best Arm
Identification in Multi-Armed Bandits
- URL: http://arxiv.org/abs/2110.08627v3
- Date: Fri, 9 Jun 2023 05:14:58 GMT
- Title: Achieving the Pareto Frontier of Regret Minimization and Best Arm
Identification in Multi-Armed Bandits
- Authors: Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan
- Abstract summary: We design and analyze the BoBW-lil'UCB$(gamma)$ algorithm.
We show that (i) no algorithm can simultaneously perform optimally for both the RM and BAI objectives.
We also show that BoBW-lil'UCB$(gamma)$ outperforms a competitor in terms of the time complexity and the regret.
- Score: 91.8283876874947
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the Pareto frontier of two archetypal objectives in multi-armed
bandits, namely, regret minimization (RM) and best arm identification (BAI)
with a fixed horizon. It is folklore that the balance between exploitation and
exploration is crucial for both RM and BAI, but exploration is more critical in
achieving the optimal performance for the latter objective. To this end, we
design and analyze the BoBW-lil'UCB$(\gamma)$ algorithm. Complementarily, by
establishing lower bounds on the regret achievable by any algorithm with a
given BAI failure probability, we show that (i) no algorithm can simultaneously
perform optimally for both the RM and BAI objectives, and (ii)
BoBW-lil'UCB$(\gamma)$ achieves order-wise optimal performance for RM or BAI
under different values of $\gamma$. Our work elucidates the trade-off more
precisely by showing how the constants in previous works depend on certain
hardness parameters. Finally, we show that BoBW-lil'UCB outperforms a close
competitor UCB$_\alpha$ (Degenne et al., 2019) in terms of the time complexity
and the regret on diverse datasets such as MovieLens and Published Kinase
Inhibitor Set.
Related papers
- Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Fast and Regret Optimal Best Arm Identification: Fundamental Limits and Low-Complexity Algorithms [15.038210624870656]
Multi-Armed Bandit (MAB) problem with dual objectives: (i) quick identification and commitment to the optimal arm, and (ii) reward throughout a sequence of $T$ consecutive rounds.
This paper introduces emphRegret Best Arm Identification (ROBAI) which aims to achieve these dual objectives.
arXiv Detail & Related papers (2023-09-01T17:12:43Z) - Multi-Fidelity Multi-Armed Bandits Revisited [46.19926456682379]
We study the multi-fidelity multi-armed bandit (MF-MAB), an extension of the canonical multi-armed bandit (MAB) problem.
MF-MAB allows each arm to be pulled with different costs (fidelities) and observation accuracy.
arXiv Detail & Related papers (2023-06-13T13:19:20Z) - Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with
Probabilistically Triggered Arms or Independent Arms [53.89752069984382]
We study the semi-bandits (CMAB) and focus on reducing the dependency of the batch-size $K$ in the regret bound.
First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we propose a BCUCB-T algorithm with variance-aware confidence intervals.
Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition.
arXiv Detail & Related papers (2022-08-31T13:09:39Z) - Adversarially Robust Multi-Armed Bandit Algorithm with
Variance-Dependent Regret Bounds [34.37963000493442]
This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BOBW) algorithm that works nearly optimally in both adversarial settings.
arXiv Detail & Related papers (2022-06-14T12:58:46Z) - Best Arm Identification under Additive Transfer Bandits [49.69203462561861]
We study the setting when, despite the means being unknown, there is a known additive relationship between the source and target MAB instances.
We propose and theoretically analyze an LUCB-style algorithm to identify an $epsilon$-optimal target arm with high probability.
arXiv Detail & Related papers (2021-12-08T02:20:18Z) - Restless-UCB, an Efficient and Low-complexity Algorithm for Online
Restless Bandits [61.490254407420906]
We study the online restless bandit problem, where the state of each arm evolves according to a Markov chain.
We propose Restless-UCB, a learning policy that follows the explore-then-commit framework.
arXiv Detail & Related papers (2020-11-05T05:16:04Z) - Lenient Regret for Multi-Armed Bandits [72.56064196252498]
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took.
While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results.
We suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $epsilon$.
arXiv Detail & Related papers (2020-08-10T08:30:52Z) - The Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms [10.662105162882526]
We investigate a $k$-armed bandit problem in the emphmany-armed regime.
Our findings suggest a new form of emphfree exploration beneficial to greedy algorithms in the many-armed context.
arXiv Detail & Related papers (2020-02-24T08:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.