Quantum Computing Provides Exponential Regret Improvement in Episodic
Reinforcement Learning
- URL: http://arxiv.org/abs/2302.08617v1
- Date: Thu, 16 Feb 2023 23:01:27 GMT
- Title: Quantum Computing Provides Exponential Regret Improvement in Episodic
Reinforcement Learning
- Authors: Bhargav Ganguly and Yulian Wu and Di Wang and Vaneet Aggarwal
- Abstract summary: We propose an textitUpper Confidence Bound (UCB) based quantum algorithmic framework to facilitate learning of a finite-horizon MDP.
Our quantum algorithm achieves an exponential improvement in regret as compared to the classical counterparts.
- Score: 35.11103784048256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we investigate the problem of \textit{episodic reinforcement
learning} with quantum oracles for state evolution. To this end, we propose an
\textit{Upper Confidence Bound} (UCB) based quantum algorithmic framework to
facilitate learning of a finite-horizon MDP. Our quantum algorithm achieves an
exponential improvement in regret as compared to the classical counterparts,
achieving a regret of $\Tilde{\mathcal{O}}(1)$ as compared to
$\Tilde{\mathcal{O}}(\sqrt{K})$ \footnote{$\Tilde{\mathcal{O}}(\cdot)$ hides
logarithmic terms.}, $K$ being the number of training episodes. In order to
achieve this advantage, we exploit efficient quantum mean estimation technique
that provides quadratic improvement in the number of i.i.d. samples needed to
estimate the mean of sub-Gaussian random variables as compared to classical
mean estimation. This improvement is a key to the significant regret
improvement in quantum reinforcement learning. We provide proof-of-concept
experiments on various RL environments that in turn demonstrate performance
gains of the proposed algorithmic framework.
Related papers
- Estimating quantum amplitudes can be exponentially improved [11.282486674587236]
Estimating quantum amplitudes is a fundamental task in quantum computing.
We present a novel framework for estimating quantum amplitudes by transforming pure states into their matrix forms.
Our framework achieves the standard quantum limit $epsilon-2$ and the Heisenberg limit $epsilon-1$, respectively.
arXiv Detail & Related papers (2024-08-25T04:35:53Z) - Towards large-scale quantum optimization solvers with few qubits [59.63282173947468]
We introduce a variational quantum solver for optimizations over $m=mathcalO(nk)$ binary variables using only $n$ qubits, with tunable $k>1$.
We analytically prove that the specific qubit-efficient encoding brings in a super-polynomial mitigation of barren plateaus as a built-in feature.
arXiv Detail & Related papers (2024-01-17T18:59:38Z) - A Quadratic Speedup in Finding Nash Equilibria of Quantum Zero-Sum Games [102.46640028830441]
We introduce the Optimistic Matrix Multiplicative Weights Update (OMMWU) algorithm and establish its average-iterate convergence complexity as $mathcalO(d/epsilon)$ to $epsilon$-Nash equilibria.
This quadratic speed-up sets a new benchmark for computing $epsilon$-Nash equilibria in quantum zero-sum games.
arXiv Detail & Related papers (2023-11-17T20:38:38Z) - Projected Stochastic Gradient Descent with Quantum Annealed Binary Gradients [51.82488018573326]
We present QP-SBGD, a novel layer-wise optimiser tailored towards training neural networks with binary weights.
BNNs reduce the computational requirements and energy consumption of deep learning models with minimal loss in accuracy.
Our algorithm is implemented layer-wise, making it suitable to train larger networks on resource-limited quantum hardware.
arXiv Detail & Related papers (2023-10-23T17:32:38Z) - Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes [32.07657827173262]
We introduce an innovative quantum framework for the agent's engagement with an unknown MDP.
We show that the quantum advantage in mean estimation leads to exponential advancements in regret guarantees for infinite horizon Reinforcement Learning.
arXiv Detail & Related papers (2023-10-18T03:17:51Z) - Quantum Bayesian Optimization [64.58749619145908]
We introduce the quantum-Gaussian process-upper confidence bound (Q-GP-UCB) algorithm.
It is the first BO algorithm able to achieve a regret upper bound of O(polylog T), which is significantly smaller than its regret lower bound of Omega(sqrt(T)) in the classical setting.
Thanks to our novel analysis of the confidence ellipsoid, our Q-GP-UCB with the linear kernel achieves a smaller regret than the quantum linear UCB algorithm.
arXiv Detail & Related papers (2023-10-09T03:10:42Z) - Alleviating the quantum Big-$M$ problem [0.237499051649312]
Classically known as the "Big-$M$" problem, it affects the physical energy scale.
We take a systematic, encompassing look at the quantum big-$M$ problem, revealing NP-hardness in finding the optimal $M$.
We propose a practical translation algorithm, based on SDP relaxation, that outperforms previous methods in numerical benchmarks.
arXiv Detail & Related papers (2023-07-19T18:00:05Z) - Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret [23.418957451727255]
We propose a novel UCRL-style algorithm for quantum reinforcement learning (RL)
We establish an $mathcalO(mathrmpoly(S, A, H, log T))$ worst-case regret for it, where $T$ is the number of episodes.
Specifically, we develop a quantum algorithm based on value target regression (VTR) for linear mixture MDPs with $d$-dimensional linear representation.
arXiv Detail & Related papers (2023-02-21T16:23:11Z) - F-Divergences and Cost Function Locality in Generative Modelling with
Quantum Circuits [0.0]
We consider training a quantum circuit Born machine using $f$-divergences.
We introduce two algorithms which demonstrably improve the training of the Born machine.
We discuss the long-term implications of quantum devices for computing $f$-divergences.
arXiv Detail & Related papers (2021-10-08T17:04:18Z) - Realization of arbitrary doubly-controlled quantum phase gates [62.997667081978825]
We introduce a high-fidelity gate set inspired by a proposal for near-term quantum advantage in optimization problems.
By orchestrating coherent, multi-level control over three transmon qutrits, we synthesize a family of deterministic, continuous-angle quantum phase gates acting in the natural three-qubit computational basis.
arXiv Detail & Related papers (2021-08-03T17:49:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.