Related papers: Variance-Dependent Best Arm Identification

Variance-Dependent Best Arm Identification

URL: http://arxiv.org/abs/2106.10417v3
Date: Sat, 27 May 2023 07:30:09 GMT
Title: Variance-Dependent Best Arm Identification
Authors: Pinyan Lu, Chao Tao, Xiaojin Zhang
Abstract summary: We study the problem of identifying the best arm in a multi-armed bandit game. We propose an adaptive algorithm which explores the gaps and variances of the rewards of the arms. We show that our algorithm is optimal up to doubly logarithmic terms.
Score: 12.058652922228918
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the problem of identifying the best arm in a stochastic multi-armed bandit game. Given a set of $n$ arms indexed from $1$ to $n$, each arm $i$ is associated with an unknown reward distribution supported on $[0,1]$ with mean $\theta_i$ and variance $\sigma_i^2$. Assume $\theta_1 > \theta_2 \geq \cdots \geq\theta_n$. We propose an adaptive algorithm which explores the gaps and variances of the rewards of the arms and makes future decisions based on the gathered information using a novel approach called \textit{grouped median elimination}. The proposed algorithm guarantees to output the best arm with probability $(1-\delta)$ and uses at most $O \left(\sum_{i = 1}^n \left(\frac{\sigma_i^2}{\Delta_i^2} + \frac{1}{\Delta_i}\right)(\ln \delta^{-1} + \ln \ln \Delta_i^{-1})\right)$ samples, where $\Delta_i$ ($i \geq 2$) denotes the reward gap between arm $i$ and the best arm and we define $\Delta_1 = \Delta_2$. This achieves a significant advantage over the variance-independent algorithms in some favorable scenarios and is the first result that removes the extra $\ln n$ factor on the best arm compared with the state-of-the-art. We further show that $\Omega \left( \sum_{i = 1}^n \left( \frac{\sigma_i^2}{\Delta_i^2} + \frac{1}{\Delta_i} \right) \ln \delta^{-1} \right)$ samples are necessary for an algorithm to achieve the same goal, thereby illustrating that our algorithm is optimal up to doubly logarithmic terms.

Related papers

Multi-objective Good Arm Identification with Bandit Feedback [0.589889361990138]
We consider a good arm identification problem in a bandit setting with multi-objectives. For each round $t$, the player/algorithm pulls one arm $i_t$ and receives a $M$ dimensional vector feedback sampled according to $mathcalD_i_t$. The proposed algorithm attains better numerical performance than other baselines in the experiments on synthetic and real datasets.
arXiv Detail & Related papers (2025-03-13T14:04:04Z)
Optimal Streaming Algorithms for Multi-Armed Bandits [28.579280943038555]
We study the streaming BAI problem, where the objective is to identify the arm with the maximum reward mean with at least $1-delta$ probability. We present a single-arm-memory algorithm that achieves a near instance-dependent optimal sample complexity within $O(log Delta-1)$ passes.
arXiv Detail & Related papers (2024-10-23T12:54:04Z)
A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity [28.068960555415014]
We investigate the fixed-budget best-arm identification problem for linear bandits in a potentially non-stationary environment. An algorithm will aim to correctly identify the best arm $x* := argmax_xinmathcalXxtopsum_t=1Ttheta_t$ with probability as high as possible.
arXiv Detail & Related papers (2023-07-27T19:03:36Z)
Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits [4.811176167998627]
We study pure exploration with infinitely many bandit arms generated i.i.d. from an unknown distribution. Our goal is to efficiently select a single high quality arm whose average reward is, with probability $1-delta$, within $varepsilon$ of being among the top $eta$-fraction of arms.
arXiv Detail & Related papers (2023-06-03T04:00:47Z)
Variance-Aware Sparse Linear Bandits [64.70681598741417]
Worst-case minimax regret for sparse linear bandits is $widetildeThetaleft(sqrtdTright)$. In the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve an $widetildemathcal O(1)$ regret. We develop a general framework that converts any variance-aware linear bandit algorithm to a variance-aware algorithm for sparse linear bandits.
arXiv Detail & Related papers (2022-05-26T15:55:44Z)
Approximate Top-$m$ Arm Identification with Heterogeneous Reward Variances [11.555138461092177]
We show that the worst-case sample complexity of this problem is $$Thetaleft( sum_i =1n fracsigma_i2epsilon2 lnfrac1delta + sum_i in Gm fracsigma_j2epsilon2 textEnt(sigma2_Gr) right),$$ where $Gm, Gl, Gr
arXiv Detail & Related papers (2022-04-11T16:41:33Z)
Differentially Private Multi-Armed Bandits in the Shuffle Model [58.22098764071924]
We give an $(varepsilon,delta)$-differentially private algorithm for the multi-armed bandit (MAB) problem in the shuffle model. Our upper bound almost matches the regret of the best known algorithms for the centralized model, and significantly outperforms the best known algorithm in the local model.
arXiv Detail & Related papers (2021-06-05T14:11:01Z)
Bandits with many optimal arms [68.17472536610859]
We write $p*$ for the proportion of optimal arms and $Delta$ for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting.
arXiv Detail & Related papers (2021-03-23T11:02:31Z)
Combinatorial Bandits without Total Order for Arms [52.93972547896022]
We present a reward model that captures set-dependent reward distribution and assumes no total order for arms. We develop a novel regret analysis and show an $Oleft(frack2 n log Tepsilonright)$ gap-dependent regret bound as well as an $Oleft(k2sqrtn T log Tright)$ gap-independent regret bound.
arXiv Detail & Related papers (2021-03-03T23:08:59Z)
Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous. We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy. We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z)
Maximizing Determinants under Matroid Constraints [69.25768526213689]
We study the problem of finding a basis $S$ of $M$ such that $det(sum_i in Sv_i v_i v_itop)$ is maximized. This problem appears in a diverse set of areas such as experimental design, fair allocation of goods, network design, and machine learning.
arXiv Detail & Related papers (2020-04-16T19:16:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.