Quantum exploration algorithms for multi-armed bandits
- URL: http://arxiv.org/abs/2007.07049v2
- Date: Tue, 15 Dec 2020 18:31:01 GMT
- Title: Quantum exploration algorithms for multi-armed bandits
- Authors: Daochen Wang, Xuchen You, Tongyang Li, Andrew M. Childs
- Abstract summary: We show that we can find the best arm with fixed confidence using $tildeObigl(sqrtsum_i=2nDeltasmash-2_ibigr)$ quantum queries.
This algorithm, based on variable-time amplitude amplification and estimation, gives a quadratic speedup compared to the best possible classical result.
- Score: 15.666205208594567
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Identifying the best arm of a multi-armed bandit is a central problem in
bandit optimization. We study a quantum computational version of this problem
with coherent oracle access to states encoding the reward probabilities of each
arm as quantum amplitudes. Specifically, we show that we can find the best arm
with fixed confidence using
$\tilde{O}\bigl(\sqrt{\sum_{i=2}^n\Delta^{\smash{-2}}_i}\bigr)$ quantum
queries, where $\Delta_{i}$ represents the difference between the mean reward
of the best arm and the $i^\text{th}$-best arm. This algorithm, based on
variable-time amplitude amplification and estimation, gives a quadratic speedup
compared to the best possible classical result. We also prove a matching
quantum lower bound (up to poly-logarithmic factors).
Related papers
- A General Framework for Clustering and Distribution Matching with Bandit Feedback [81.50716021326194]
We develop a general framework for clustering and distribution matching problems with bandit feedback.
We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $delta$.
arXiv Detail & Related papers (2024-09-08T12:19:12Z) - Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible.
We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost.
Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z) - Nearly-tight Approximation Guarantees for the Improving Multi-Armed Bandits Problem [10.994427113326996]
An instance of this problem has $k$ arms, each of whose reward function is a concave and increasing function of the number of times that arm has been pulled so far.
We show that for any randomized online algorithm, there exists an instance on which it must suffer at least an $Omega(sqrtk)$ approximation factor relative to the optimal reward.
We then show how to remove this assumption at the cost of an extra $O(sqrtk log k)$ approximation factor, achieving an overall $O(sqrtk
arXiv Detail & Related papers (2024-04-01T15:55:45Z) - Quantum Heavy-tailed Bandits [36.458771174473924]
We study multi-armed bandits (MAB) and linear bandits (SLB) with heavy-tailed rewards and quantum reward.
We first propose a new quantum mean estimator for heavy-tailed distributions, which is based on the Quantum Monte Carlo Estimator.
Based on our quantum mean estimator, we focus on quantum heavy-tailed MAB and SLB and propose quantum algorithms based on the Upper Confidence Bound (UCB) framework.
arXiv Detail & Related papers (2023-01-23T19:23:10Z) - Mean-based Best Arm Identification in Stochastic Bandits under Reward
Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits.
Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z) - Best-Arm Identification in Correlated Multi-Armed Bandits [9.180690233707645]
We propose a novel correlated bandit framework that captures domain knowledge about correlation between arms.
We show that the total samples obtained by C-LUCB are of the form $mathcalOleft(sum_k in mathcalC logleft(frac1deltaright)$ as opposed to the typical $mathcalOleft(sum_k in mathcalC logleft(frac1deltaright)$ samples required in the independent reward setting
arXiv Detail & Related papers (2021-09-10T15:41:07Z) - Continuous Time Bandits With Sampling Costs [17.412117389855222]
We consider a continuous-time multi-arm bandit problem (CTMAB), where the learner can sample arms any number of times in a given interval and obtain a random reward from each sample.
There is a tradeoff between obtaining large reward and incurring sampling cost as a function of the sampling frequency.
The goal is to design a learning algorithm that minimizes regret, that is defined as the difference of the payoff of the oracle policy and that of the learning algorithm.
arXiv Detail & Related papers (2021-07-12T10:00:35Z) - Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous.
We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy.
We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z) - Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds.
Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z) - Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean.
We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z) - Quantum Bandits [10.151012770913622]
We consider the quantum version of the bandit problem known as em best arm identification (BAI)
We first propose a quantum modeling of the BAI problem, which assumes that both the learning agent and the environment are quantum.
We then propose an algorithm based on quantum amplitude amplification to solve BAI.
arXiv Detail & Related papers (2020-02-15T15:17:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.