Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret
- URL: http://arxiv.org/abs/2302.10796v2
- Date: Thu, 13 Jun 2024 17:00:41 GMT
- Title: Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret
- Authors: Han Zhong, Jiachen Hu, Yecheng Xue, Tongyang Li, Liwei Wang,
- Abstract summary: We propose a novel UCRL-style algorithm for quantum reinforcement learning (RL)
We establish an $mathcalO(mathrmpoly(S, A, H, log T))$ worst-case regret for it, where $T$ is the number of episodes.
Specifically, we develop a quantum algorithm based on value target regression (VTR) for linear mixture MDPs with $d$-dimensional linear representation.
- Score: 23.418957451727255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While quantum reinforcement learning (RL) has attracted a surge of attention recently, its theoretical understanding is limited. In particular, it remains elusive how to design provably efficient quantum RL algorithms that can address the exploration-exploitation trade-off. To this end, we propose a novel UCRL-style algorithm that takes advantage of quantum computing for tabular Markov decision processes (MDPs) with $S$ states, $A$ actions, and horizon $H$, and establish an $\mathcal{O}(\mathrm{poly}(S, A, H, \log T))$ worst-case regret for it, where $T$ is the number of episodes. Furthermore, we extend our results to quantum RL with linear function approximation, which is capable of handling problems with large state spaces. Specifically, we develop a quantum algorithm based on value target regression (VTR) for linear mixture MDPs with $d$-dimensional linear representation and prove that it enjoys $\mathcal{O}(\mathrm{poly}(d, H, \log T))$ regret. Our algorithms are variants of UCRL/UCRL-VTR algorithms in classical RL, which also leverage a novel combination of lazy updating mechanisms and quantum estimation subroutines. This is the key to breaking the $\Omega(\sqrt{T})$-regret barrier in classical RL. To the best of our knowledge, this is the first work studying the online exploration in quantum RL with provable logarithmic worst-case regret.
Related papers
- Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent [37.59299233291882]
We propose Q-Newton, a hybrid quantum-classical scheduler for accelerating neural network training with Newton's GD.
Q-Newton utilizes a streamlined scheduling module that coordinates between quantum and classical linear solvers.
Our evaluation showcases the potential for Q-Newton to significantly reduce the total training time compared to commonly used quantum machines.
arXiv Detail & Related papers (2024-04-30T23:55:03Z) - Towards large-scale quantum optimization solvers with few qubits [59.63282173947468]
We introduce a variational quantum solver for optimizations over $m=mathcalO(nk)$ binary variables using only $n$ qubits, with tunable $k>1$.
We analytically prove that the specific qubit-efficient encoding brings in a super-polynomial mitigation of barren plateaus as a built-in feature.
arXiv Detail & Related papers (2024-01-17T18:59:38Z) - Online Learning Quantum States with the Logarithmic Loss via VB-FTRL [1.8856444568755568]
Online learning of quantum states with the logarithmic loss (LL-OLQS) is a classic open problem in online learning for over three decades.
In this paper, we generalize VB-FTRL for LL-OLQS with moderate computational complexity.
Each of the algorithm consists of a semidefinite program that can be implemented in time by, for example, cutting-plane methods.
arXiv Detail & Related papers (2023-11-06T15:45:33Z) - Graph Neural Network Autoencoders for Efficient Quantum Circuit
Optimisation [69.43216268165402]
We present for the first time how to use graph neural network (GNN) autoencoders for the optimisation of quantum circuits.
We construct directed acyclic graphs from the quantum circuits, encode the graphs and use the encodings to represent RL states.
Our method is the first realistic first step towards very large scale RL quantum circuit optimisation.
arXiv Detail & Related papers (2023-03-06T16:51:30Z) - Provably Efficient Reinforcement Learning via Surprise Bound [66.15308700413814]
We propose a provably efficient reinforcement learning algorithm (both computationally and statistically) with general value function approximations.
Our algorithm achieves reasonable regret bounds when applied to both the linear setting and the sparse high-dimensional linear setting.
arXiv Detail & Related papers (2023-02-22T20:21:25Z) - Quantum Computing Provides Exponential Regret Improvement in Episodic
Reinforcement Learning [35.11103784048256]
We propose an textitUpper Confidence Bound (UCB) based quantum algorithmic framework to facilitate learning of a finite-horizon MDP.
Our quantum algorithm achieves an exponential improvement in regret as compared to the classical counterparts.
arXiv Detail & Related papers (2023-02-16T23:01:27Z) - Learning to predict arbitrary quantum processes [7.69390398476646]
We present an efficient machine learning (ML) algorithm for predicting any unknown quantum process over $n$ qubits.
Our results highlight the potential for ML models to predict the output of complex quantum dynamics much faster than the time needed to run the process itself.
arXiv Detail & Related papers (2022-10-26T17:52:59Z) - A single $T$-gate makes distribution learning hard [56.045224655472865]
This work provides an extensive characterization of the learnability of the output distributions of local quantum circuits.
We show that for a wide variety of the most practically relevant learning algorithms -- including hybrid-quantum classical algorithms -- even the generative modelling problem associated with depth $d=omega(log(n))$ Clifford circuits is hard.
arXiv Detail & Related papers (2022-07-07T08:04:15Z) - Human-in-the-loop: Provably Efficient Preference-based Reinforcement
Learning with General Function Approximation [107.54516740713969]
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences.
Instead of receiving a numeric reward at each step, the agent only receives preferences over trajectory pairs from a human overseer.
We propose the first optimistic model-based algorithm for PbRL with general function approximation.
arXiv Detail & Related papers (2022-05-23T09:03:24Z) - Deterministic and Entanglement-Efficient Preparation of
Amplitude-Encoded Quantum Registers [0.533024001730262]
A classical vector $mathbfb$ is encoded in the amplitudes of a quantum state.
An arbitrary state of $Q$ qubits generally requires approximately $2Q$ entangling gates.
We present a deterministic (nonvariational) algorithm that allows one to flexibly reduce the quantum resources required for state preparation.
arXiv Detail & Related papers (2021-10-26T07:37:54Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.