Related papers: Quantum exploration algorithms for multi-armed bandits

Quantum exploration algorithms for multi-armed bandits

URL: http://arxiv.org/abs/2007.07049v2
Date: Tue, 15 Dec 2020 18:31:01 GMT
Title: Quantum exploration algorithms for multi-armed bandits
Authors: Daochen Wang, Xuchen You, Tongyang Li, Andrew M. Childs
Abstract summary: We show that we can find the best arm with fixed confidence using $tildeObigl(sqrtsum_i=2nDeltasmash-2_ibigr)$ quantum queries. This algorithm, based on variable-time amplitude amplification and estimation, gives a quadratic speedup compared to the best possible classical result.
Score: 15.666205208594567
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Identifying the best arm of a multi-armed bandit is a central problem in bandit optimization. We study a quantum computational version of this problem with coherent oracle access to states encoding the reward probabilities of each arm as quantum amplitudes. Specifically, we show that we can find the best arm with fixed confidence using $\tilde{O}\bigl(\sqrt{\sum_{i=2}^n\Delta^{\smash{-2}}_i}\bigr)$ quantum queries, where $\Delta_{i}$ represents the difference between the mean reward of the best arm and the $i^\text{th}$-best arm. This algorithm, based on variable-time amplitude amplification and estimation, gives a quadratic speedup compared to the best possible classical result. We also prove a matching quantum lower bound (up to poly-logarithmic factors).

Related papers

An Algorithm for Fixed Budget Best Arm Identification with Combinatorial Exploration [3.9901365062418312]
We consider the best arm identification problem in the $K-$armed bandit framework.<n>Agent is allowed to play a subset of arms at each time slot instead of one arm.<n>We propose an algorithm that constructs $log K$ groups and performs a likelihood ratio test to detect the presence of the best arm.
arXiv Detail & Related papers (2025-02-03T15:10:08Z)
Optimal Multi-Objective Best Arm Identification with Fixed Confidence [62.36929749450298]
We consider a multi-armed bandit setting in which each arm yields an $M$-dimensional vector reward upon selection. The end goal is to identify the best arm of em every objective in the shortest (expected) time subject to an upper bound on the probability of error. We propose an algorithm that uses the novel idea of em surrogate proportions to sample the arms at each time step, eliminating the need to solve the max-min optimisation problem at each step.
arXiv Detail & Related papers (2025-01-23T12:28:09Z)
A General Framework for Clustering and Distribution Matching with Bandit Feedback [81.50716021326194]
We develop a general framework for clustering and distribution matching problems with bandit feedback. We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $delta$.
arXiv Detail & Related papers (2024-09-08T12:19:12Z)
Optimal Multi-Fidelity Best-Arm Identification [65.23078799972188]
In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible. We study multi-fidelity best-arm identification, in which the can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost. Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm.
arXiv Detail & Related papers (2024-06-05T08:02:40Z)
Nearly-tight Approximation Guarantees for the Improving Multi-Armed Bandits Problem [10.994427113326996]
An instance of this problem has $k$ arms, each of whose reward function is a concave and increasing function of the number of times that arm has been pulled so far. We show that for any randomized online algorithm, there exists an instance on which it must suffer at least an $Omega(sqrtk)$ approximation factor relative to the optimal reward. We then show how to remove this assumption at the cost of an extra $O(sqrtk log k)$ approximation factor, achieving an overall $O(sqrtk
arXiv Detail & Related papers (2024-04-01T15:55:45Z)
Quantum Heavy-tailed Bandits [36.458771174473924]
We study multi-armed bandits (MAB) and linear bandits (SLB) with heavy-tailed rewards and quantum reward. We first propose a new quantum mean estimator for heavy-tailed distributions, which is based on the Quantum Monte Carlo Estimator. Based on our quantum mean estimator, we focus on quantum heavy-tailed MAB and SLB and propose quantum algorithms based on the Upper Confidence Bound (UCB) framework.
arXiv Detail & Related papers (2023-01-23T19:23:10Z)
Multi-Armed Bandits and Quantum Channel Oracles [56.72223170618133]
We introduce further bandit models where we only have limited access to the randomness of the rewards, but we can still query the arms in superposition. This generalizes the prior result that no speed-up is possible for unstructured search when the oracle has positive failure probability.
arXiv Detail & Related papers (2023-01-20T13:01:40Z)
Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination [80.53485617514707]
This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. Specifically, for the gap-based algorithm, the sample complexity is optimal up to constant factors, while for the successive elimination, it is optimal up to logarithmic factors.
arXiv Detail & Related papers (2021-11-14T21:49:58Z)
Best-Arm Identification in Correlated Multi-Armed Bandits [9.180690233707645]
We propose a novel correlated bandit framework that captures domain knowledge about correlation between arms. We show that the total samples obtained by C-LUCB are of the form $mathcalOleft(sum_k in mathcalC logleft(frac1deltaright)$ as opposed to the typical $mathcalOleft(sum_k in mathcalC logleft(frac1deltaright)$ samples required in the independent reward setting
arXiv Detail & Related papers (2021-09-10T15:41:07Z)
Continuous Time Bandits With Sampling Costs [17.412117389855222]
We consider a continuous-time multi-arm bandit problem (CTMAB), where the learner can sample arms any number of times in a given interval and obtain a random reward from each sample. There is a tradeoff between obtaining large reward and incurring sampling cost as a function of the sampling frequency. The goal is to design a learning algorithm that minimizes regret, that is defined as the difference of the payoff of the oracle policy and that of the learning algorithm.
arXiv Detail & Related papers (2021-07-12T10:00:35Z)
Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy [71.17938026619068]
We study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous. We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weighting strategy. We show that our algorithm has a regret guarantee of $O(ksqrt(A-k+1)T log (|mathcalF|T))$.
arXiv Detail & Related papers (2021-02-15T19:10:52Z)
Optimal Best-arm Identification in Linear Bandits [79.3239137440876]
We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds. Unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms.
arXiv Detail & Related papers (2020-06-29T14:25:51Z)
Robustness Guarantees for Mode Estimation with an Application to Bandits [131.21717367564963]
We introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences.
arXiv Detail & Related papers (2020-03-05T21:29:27Z)
Quantum Bandits [10.151012770913622]
We consider the quantum version of the bandit problem known as em best arm identification (BAI) We first propose a quantum modeling of the BAI problem, which assumes that both the learning agent and the environment are quantum. We then propose an algorithm based on quantum amplitude amplification to solve BAI.
arXiv Detail & Related papers (2020-02-15T15:17:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.