Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy
Logarithmic Regrets
- URL: http://arxiv.org/abs/2205.14988v1
- Date: Mon, 30 May 2022 10:54:53 GMT
- Title: Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy
Logarithmic Regrets
- Authors: Zongqi Wan, Zhijie Zhang, Tongyang Li, Jialin Zhang, Xiaoming Sun
- Abstract summary: Multi-arm bandit (MAB) and linear bandit (SLB) are important models in reinforcement learning.
We propose quantum algorithms for both models with $O(mboxpoly(log T))$ regrets.
This is the first provable quantum speedup for regrets of bandit problems and in general exploitation in reinforcement learning.
- Score: 20.426493569846855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-arm bandit (MAB) and stochastic linear bandit (SLB) are important
models in reinforcement learning, and it is well-known that classical
algorithms for bandits with time horizon $T$ suffer $\Omega(\sqrt{T})$ regret.
In this paper, we study MAB and SLB with quantum reward oracles and propose
quantum algorithms for both models with $O(\mbox{poly}(\log T))$ regrets,
exponentially improving the dependence in terms of $T$. To the best of our
knowledge, this is the first provable quantum speedup for regrets of bandit
problems and in general exploitation in reinforcement learning. Compared to
previous literature on quantum exploration algorithms for MAB and reinforcement
learning, our quantum input model is simpler and only assumes quantum oracles
for each individual arm.
Related papers
- Hybrid Quantum-Classical Machine Learning with String Diagrams [49.1574468325115]
This paper develops a formal framework for describing hybrid algorithms in terms of string diagrams.
A notable feature of our string diagrams is the use of functor boxes, which correspond to a quantum-classical interfaces.
arXiv Detail & Related papers (2024-07-04T06:37:16Z) - A Quadratic Speedup in Finding Nash Equilibria of Quantum Zero-Sum Games [102.46640028830441]
We introduce the Optimistic Matrix Multiplicative Weights Update (OMMWU) algorithm and establish its average-iterate convergence complexity as $mathcalO(d/epsilon)$ to $epsilon$-Nash equilibria.
This quadratic speed-up sets a new benchmark for computing $epsilon$-Nash equilibria in quantum zero-sum games.
arXiv Detail & Related papers (2023-11-17T20:38:38Z) - Quantum Bayesian Optimization [64.58749619145908]
We introduce the quantum-Gaussian process-upper confidence bound (Q-GP-UCB) algorithm.
It is the first BO algorithm able to achieve a regret upper bound of O(polylog T), which is significantly smaller than its regret lower bound of Omega(sqrt(T)) in the classical setting.
Thanks to our novel analysis of the confidence ellipsoid, our Q-GP-UCB with the linear kernel achieves a smaller regret than the quantum linear UCB algorithm.
arXiv Detail & Related papers (2023-10-09T03:10:42Z) - Quantum Heavy-tailed Bandits [36.458771174473924]
We study multi-armed bandits (MAB) and linear bandits (SLB) with heavy-tailed rewards and quantum reward.
We first propose a new quantum mean estimator for heavy-tailed distributions, which is based on the Quantum Monte Carlo Estimator.
Based on our quantum mean estimator, we focus on quantum heavy-tailed MAB and SLB and propose quantum algorithms based on the Upper Confidence Bound (UCB) framework.
arXiv Detail & Related papers (2023-01-23T19:23:10Z) - Quantum Worst-Case to Average-Case Reductions for All Linear Problems [66.65497337069792]
We study the problem of designing worst-case to average-case reductions for quantum algorithms.
We provide an explicit and efficient transformation of quantum algorithms that are only correct on a small fraction of their inputs into ones that are correct on all inputs.
arXiv Detail & Related papers (2022-12-06T22:01:49Z) - Testing and Learning Quantum Juntas Nearly Optimally [3.030969076856776]
We consider the problem of testing and learning quantum $k$-juntas.
We give (a) a $widetildeO(sqrtk)$-query quantum algorithm that can distinguish quantum $k$-juntas from unitary matrices that are "far" from every quantum $k$-junta.
We complement our upper bounds for testing quantum $k$-juntas and learning quantum $k$-juntas with near-matching lower bounds of $Omega(sqrtk)$ and $Omega(frac
arXiv Detail & Related papers (2022-07-13T00:11:14Z) - A single $T$-gate makes distribution learning hard [56.045224655472865]
This work provides an extensive characterization of the learnability of the output distributions of local quantum circuits.
We show that for a wide variety of the most practically relevant learning algorithms -- including hybrid-quantum classical algorithms -- even the generative modelling problem associated with depth $d=omega(log(n))$ Clifford circuits is hard.
arXiv Detail & Related papers (2022-07-07T08:04:15Z) - Entanglement and coherence in Bernstein-Vazirani algorithm [58.720142291102135]
Bernstein-Vazirani algorithm allows one to determine a bit string encoded into an oracle.
We analyze in detail the quantum resources in the Bernstein-Vazirani algorithm.
We show that in the absence of entanglement, the performance of the algorithm is directly related to the amount of quantum coherence in the initial state.
arXiv Detail & Related papers (2022-05-26T20:32:36Z) - Quantum exploration algorithms for multi-armed bandits [15.666205208594567]
We show that we can find the best arm with fixed confidence using $tildeObigl(sqrtsum_i=2nDeltasmash-2_ibigr)$ quantum queries.
This algorithm, based on variable-time amplitude amplification and estimation, gives a quadratic speedup compared to the best possible classical result.
arXiv Detail & Related papers (2020-07-14T14:15:20Z) - Quantum Bandits [10.151012770913622]
We consider the quantum version of the bandit problem known as em best arm identification (BAI)
We first propose a quantum modeling of the BAI problem, which assumes that both the learning agent and the environment are quantum.
We then propose an algorithm based on quantum amplitude amplification to solve BAI.
arXiv Detail & Related papers (2020-02-15T15:17:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.