Best-Arm Identification in Correlated Multi-Armed Bandits
- URL: http://arxiv.org/abs/2109.04941v1
- Date: Fri, 10 Sep 2021 15:41:07 GMT
- Title: Best-Arm Identification in Correlated Multi-Armed Bandits
- Authors: Samarth Gupta, Gauri Joshi, Osman Ya\u{g}an
- Abstract summary: We propose a novel correlated bandit framework that captures domain knowledge about correlation between arms.
We show that the total samples obtained by C-LUCB are of the form $mathcalOleft(sum_k in mathcalC logleft(frac1deltaright)$ as opposed to the typical $mathcalOleft(sum_k in mathcalC logleft(frac1deltaright)$ samples required in the independent reward setting
- Score: 9.180690233707645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we consider the problem of best-arm identification in
multi-armed bandits in the fixed confidence setting, where the goal is to
identify, with probability $1-\delta$ for some $\delta>0$, the arm with the
highest mean reward in minimum possible samples from the set of arms
$\mathcal{K}$. Most existing best-arm identification algorithms and analyses
operate under the assumption that the rewards corresponding to different arms
are independent of each other. We propose a novel correlated bandit framework
that captures domain knowledge about correlation between arms in the form of
upper bounds on expected conditional reward of an arm, given a reward
realization from another arm. Our proposed algorithm C-LUCB, which generalizes
the LUCB algorithm utilizes this partial knowledge of correlations to sharply
reduce the sample complexity of best-arm identification. More interestingly, we
show that the total samples obtained by C-LUCB are of the form
$\mathcal{O}\left(\sum_{k \in \mathcal{C}}
\log\left(\frac{1}{\delta}\right)\right)$ as opposed to the typical
$\mathcal{O}\left(\sum_{k \in \mathcal{K}}
\log\left(\frac{1}{\delta}\right)\right)$ samples required in the independent
reward setting. The improvement comes, as the $\mathcal{O}(\log(1/\delta))$
term is summed only for the set of competitive arms $\mathcal{C}$, which is a
subset of the original set of arms $\mathcal{K}$. The size of the set
$\mathcal{C}$, depending on the problem setting, can be as small as $2$, and
hence using C-LUCB in the correlated bandits setting can lead to significant
performance improvements. Our theoretical findings are supported by experiments
on the Movielens and Goodreads recommendation datasets.
Related papers
- A General Framework for Clustering and Distribution Matching with Bandit Feedback [81.50716021326194]
We develop a general framework for clustering and distribution matching problems with bandit feedback.
We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $delta$.
arXiv Detail & Related papers (2024-09-08T12:19:12Z) - Communication-Constrained Bandits under Additive Gaussian Noise [111.06688156723018]
We study a distributed multi-armed bandit where a client supplies the learner with communication-constrained feedback.
We propose a multi-phase bandit algorithm, $mathttUEtext-UCB++$, which matches this lower bound to a minor additive factor.
arXiv Detail & Related papers (2023-04-25T09:31:20Z) - Revisiting Simple Regret Minimization in Multi-Armed Bandits [33.88679679593314]
Simple regret is less popular than the probability of missing the best arm or an $epsilon$-good arm.
In this paper, we achieve improved simple regret upper bounds for both data-rich ($Tge n$) and data-poor regime ($T le n$)
For the more challenging data-poor regime, we propose Bracketing SH (BSH) that enjoys the same improvement even without sampling each arm at least once.
arXiv Detail & Related papers (2022-10-30T18:31:03Z) - Variance-Dependent Best Arm Identification [12.058652922228918]
We study the problem of identifying the best arm in a multi-armed bandit game.
We propose an adaptive algorithm which explores the gaps and variances of the rewards of the arms.
We show that our algorithm is optimal up to doubly logarithmic terms.
arXiv Detail & Related papers (2021-06-19T04:13:54Z) - Bandits with many optimal arms [68.17472536610859]
We write $p*$ for the proportion of optimal arms and $Delta$ for the minimal mean-gap between optimal and sub-optimal arms.
We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting.
arXiv Detail & Related papers (2021-03-23T11:02:31Z) - Combinatorial Bandits without Total Order for Arms [52.93972547896022]
We present a reward model that captures set-dependent reward distribution and assumes no total order for arms.
We develop a novel regret analysis and show an $Oleft(frack2 n log Tepsilonright)$ gap-dependent regret bound as well as an $Oleft(k2sqrtn T log Tright)$ gap-independent regret bound.
arXiv Detail & Related papers (2021-03-03T23:08:59Z) - Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous.
We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy.
We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z) - An Optimal Elimination Algorithm for Learning a Best Arm [37.18327505953403]
We consider the classic problem of $(epsilon,delta)$-PAC learning a best arm.
We propose a new approach for $(epsilon,delta)$-PAC learning a best arm.
arXiv Detail & Related papers (2020-06-20T20:21:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.