Related papers: Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets

Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets

URL: http://arxiv.org/abs/2205.14988v1
Date: Mon, 30 May 2022 10:54:53 GMT
Title: Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets
Authors: Zongqi Wan, Zhijie Zhang, Tongyang Li, Jialin Zhang, Xiaoming Sun
Abstract summary: Multi-arm bandit (MAB) and linear bandit (SLB) are important models in reinforcement learning. We propose quantum algorithms for both models with $O(mboxpoly(log T))$ regrets. This is the first provable quantum speedup for regrets of bandit problems and in general exploitation in reinforcement learning.
Score: 20.426493569846855
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-arm bandit (MAB) and stochastic linear bandit (SLB) are important models in reinforcement learning, and it is well-known that classical algorithms for bandits with time horizon $T$ suffer $\Omega(\sqrt{T})$ regret. In this paper, we study MAB and SLB with quantum reward oracles and propose quantum algorithms for both models with $O(\mbox{poly}(\log T))$ regrets, exponentially improving the dependence in terms of $T$. To the best of our knowledge, this is the first provable quantum speedup for regrets of bandit problems and in general exploitation in reinforcement learning. Compared to previous literature on quantum exploration algorithms for MAB and reinforcement learning, our quantum input model is simpler and only assumes quantum oracles for each individual arm.

Related papers

A Bit of Freedom Goes a Long Way: Classical and Quantum Algorithms for Reinforcement Learning under a Generative Model [0.40964539027092906]
We propose novel classical and quantum online algorithms for learning finite-horizon and infinite-horizon average-reward Markov Decision Processes (MDPs)<n>Our algorithms are based on a hybrid exploration-generative reinforcement learning model wherein the agent can, from time to time, freely interact with the environment in a generative sampling fashion, i.e., by having access to a "simulator"<n>We show that it is possible to avoid several paradigms from RL like "optimism in the face of uncertainty" and "posterior sampling" and instead compute and use optimal policies directly, which yields better regret bounds compared to previous works
arXiv Detail & Related papers (2025-07-30T17:24:23Z)
Quantum Algorithms for Bandits with Knapsacks with Improved Regret and Time Complexities [23.221938246770712]
Bandits with knapsacks (BwK) constitute a model that combines integer programming with online learning.<n>We study the BwK model in the setting of quantum computing, where both reward and resource consumption can be accessed via quantum oracles.<n>Compared to previous works on quantum algorithms for multi-armed bandits, our study is the first to consider bandit models with resource constraints.
arXiv Detail & Related papers (2025-07-06T15:52:37Z)
Differential Privacy of Quantum and Quantum-Inspired-Classical Recommendation Algorithms [2.36119616330904]
We analyze the DP (differential privacy) properties of the quantum recommendation algorithm and the quantum-inspired-classical recommendation algorithm. We find that the quantum recommendation algorithm is a privacy curating mechanism on its own, requiring no external noise. A comparison shows that the quantum algorithm has better privacy preserving potential than the classical one.
arXiv Detail & Related papers (2025-02-07T08:45:00Z)
Slow Mixing of Quantum Gibbs Samplers [47.373245682678515]
We present a quantum generalization of these tools through a generic bottleneck lemma. This lemma focuses on quantum measures of distance, analogous to the classical Hamming distance but rooted in uniquely quantum principles. We show how to lift classical slow mixing results in the presence of a transverse field using Poisson Feynman-Kac techniques.
arXiv Detail & Related papers (2024-11-06T22:51:27Z)
Hybrid Quantum-Classical Machine Learning with String Diagrams [49.1574468325115]
This paper develops a formal framework for describing hybrid algorithms in terms of string diagrams. A notable feature of our string diagrams is the use of functor boxes, which correspond to a quantum-classical interfaces.
arXiv Detail & Related papers (2024-07-04T06:37:16Z)
A Quadratic Speedup in Finding Nash Equilibria of Quantum Zero-Sum Games [102.46640028830441]
We introduce the Optimistic Matrix Multiplicative Weights Update (OMMWU) algorithm and establish its average-iterate convergence complexity as $mathcalO(d/epsilon)$ to $epsilon$-Nash equilibria. This quadratic speed-up sets a new benchmark for computing $epsilon$-Nash equilibria in quantum zero-sum games.
arXiv Detail & Related papers (2023-11-17T20:38:38Z)
Quantum Bayesian Optimization [64.58749619145908]
We introduce the quantum-Gaussian process-upper confidence bound (Q-GP-UCB) algorithm. It is the first BO algorithm able to achieve a regret upper bound of O(polylog T), which is significantly smaller than its regret lower bound of Omega(sqrt(T)) in the classical setting. Thanks to our novel analysis of the confidence ellipsoid, our Q-GP-UCB with the linear kernel achieves a smaller regret than the quantum linear UCB algorithm.
arXiv Detail & Related papers (2023-10-09T03:10:42Z)
Quantum Heavy-tailed Bandits [36.458771174473924]
We study multi-armed bandits (MAB) and linear bandits (SLB) with heavy-tailed rewards and quantum reward. We first propose a new quantum mean estimator for heavy-tailed distributions, which is based on the Quantum Monte Carlo Estimator. Based on our quantum mean estimator, we focus on quantum heavy-tailed MAB and SLB and propose quantum algorithms based on the Upper Confidence Bound (UCB) framework.
arXiv Detail & Related papers (2023-01-23T19:23:10Z)
Quantum Worst-Case to Average-Case Reductions for All Linear Problems [66.65497337069792]
We study the problem of designing worst-case to average-case reductions for quantum algorithms. We provide an explicit and efficient transformation of quantum algorithms that are only correct on a small fraction of their inputs into ones that are correct on all inputs.
arXiv Detail & Related papers (2022-12-06T22:01:49Z)
Testing and Learning Quantum Juntas Nearly Optimally [3.030969076856776]
We consider the problem of testing and learning quantum $k$-juntas. We give (a) a $widetildeO(sqrtk)$-query quantum algorithm that can distinguish quantum $k$-juntas from unitary matrices that are "far" from every quantum $k$-junta. We complement our upper bounds for testing quantum $k$-juntas and learning quantum $k$-juntas with near-matching lower bounds of $Omega(sqrtk)$ and $Omega(frac
arXiv Detail & Related papers (2022-07-13T00:11:14Z)
A single $T$-gate makes distribution learning hard [56.045224655472865]
This work provides an extensive characterization of the learnability of the output distributions of local quantum circuits. We show that for a wide variety of the most practically relevant learning algorithms -- including hybrid-quantum classical algorithms -- even the generative modelling problem associated with depth $d=omega(log(n))$ Clifford circuits is hard.
arXiv Detail & Related papers (2022-07-07T08:04:15Z)
Entanglement and coherence in Bernstein-Vazirani algorithm [58.720142291102135]
Bernstein-Vazirani algorithm allows one to determine a bit string encoded into an oracle. We analyze in detail the quantum resources in the Bernstein-Vazirani algorithm. We show that in the absence of entanglement, the performance of the algorithm is directly related to the amount of quantum coherence in the initial state.
arXiv Detail & Related papers (2022-05-26T20:32:36Z)
Quantum exploration algorithms for multi-armed bandits [15.666205208594567]
We show that we can find the best arm with fixed confidence using $tildeObigl(sqrtsum_i=2nDeltasmash-2_ibigr)$ quantum queries. This algorithm, based on variable-time amplitude amplification and estimation, gives a quadratic speedup compared to the best possible classical result.
arXiv Detail & Related papers (2020-07-14T14:15:20Z)
Quantum Bandits [10.151012770913622]
We consider the quantum version of the bandit problem known as em best arm identification (BAI) We first propose a quantum modeling of the BAI problem, which assumes that both the learning agent and the environment are quantum. We then propose an algorithm based on quantum amplitude amplification to solve BAI.
arXiv Detail & Related papers (2020-02-15T15:17:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.