Quantum Algorithms for Reinforcement Learning with a Generative Model
- URL: http://arxiv.org/abs/2112.08451v1
- Date: Wed, 15 Dec 2021 19:51:49 GMT
- Title: Quantum Algorithms for Reinforcement Learning with a Generative Model
- Authors: Daochen Wang, Aarthi Sundaram, Robin Kothari, Ashish Kapoor, Martin
Roetteler
- Abstract summary: Reinforcement learning studies how an agent should interact with an environment to maximize its cumulative reward.
We design quantum algorithms that approximate an optimal policy ($pi*$), the optimal value function ($v*$), and the optimal $Q$-function ($q*$)
We show that our quantum algorithm for computing $q*$ is optimal by proving a matching quantum lower bound.
- Score: 16.168901236223117
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning studies how an agent should interact with an
environment to maximize its cumulative reward. A standard way to study this
question abstractly is to ask how many samples an agent needs from the
environment to learn an optimal policy for a $\gamma$-discounted Markov
decision process (MDP). For such an MDP, we design quantum algorithms that
approximate an optimal policy ($\pi^*$), the optimal value function ($v^*$),
and the optimal $Q$-function ($q^*$), assuming the algorithms can access
samples from the environment in quantum superposition. This assumption is
justified whenever there exists a simulator for the environment; for example,
if the environment is a video game or some other program. Our quantum
algorithms, inspired by value iteration, achieve quadratic speedups over the
best-possible classical sample complexities in the approximation accuracy
($\epsilon$) and two main parameters of the MDP: the effective time horizon
($\frac{1}{1-\gamma}$) and the size of the action space ($A$). Moreover, we
show that our quantum algorithm for computing $q^*$ is optimal by proving a
matching quantum lower bound.
Related papers
- Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs [60.40452803295326]
We propose a new reward-free algorithm for learning linear Markov decision processes (MDPs)
At the core of our algorithm is uncertainty-weighted value-targeted regression with exploration-driven pseudo-reward.
We show that our algorithm only needs to explore $tilde O( d2varepsilon-2)$ episodes to find an $varepsilon$-optimal policy.
arXiv Detail & Related papers (2023-03-17T17:53:28Z) - A Quadratic Speedup in the Optimization of Noisy Quantum Optical
Circuits [5.074606924176912]
Linear optical quantum circuits with photon number resolving (PNR) detectors are used for both Gaussian Boson Sampling (GBS) and for the preparation of non-Gaussian states such as Gottesman-Kitaev-Preskill (GKP)
We introduce a family of algorithms that calculate detection probabilities, conditional states and their gradients with respect to circuit parametrizations.
These algorithms are implemented and ready to use in the open-source photonic optimization library MrMustard.
arXiv Detail & Related papers (2023-03-15T18:51:36Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - The Quantum Approximate Optimization Algorithm performance with low
entanglement and high circuit depth [0.0]
Variational quantum algorithms constitute one of the most widespread methods for using current noisy quantum computers.
We investigate entanglement's role in these methods for solving optimization problems.
We conclude that entanglement plays a minor role in the MaxCut and Exact Cover 3 problems studied here.
arXiv Detail & Related papers (2022-07-07T16:21:36Z) - Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov
Decision Processes [61.11090361892306]
Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration.
We show that this separation does not exist in the setting of linear MDPs.
We develop a computationally efficient algorithm for reward-free RL in a $d$-dimensional linear MDP.
arXiv Detail & Related papers (2022-01-26T22:09:59Z) - Variational Quantum Algorithms for Semidefinite Programming [3.481985817302898]
A semidefinite program (SDP) is a convex optimization problem with applications in operations research, optimization, quantum information science, and beyond.
We propose variational quantum algorithms for approximately solving SDPs.
arXiv Detail & Related papers (2021-12-16T13:10:48Z) - Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary.
Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Operator Sampling for Shot-frugal Optimization in Variational Algorithms [0.860968116787044]
We introduce a strategy for reducing the number of measurements (i.e., shots) by randomly sampling operators $h_i$ from the overall $H = sum_i c_i h_i$.
We implement this and other Samplings to find the ground states of molecules H$$, LiH, and BeH$, without and with quantum hardware noise, and Rosalin outperforms other Samplings in most cases.
arXiv Detail & Related papers (2020-04-14T01:00:07Z) - Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints [38.2783003051101]
This paper considers the peak Constrained Markov Decision Process (PCMDP), where the agent chooses the policy to maximize total reward in the finite horizon as well as satisfy constraints at each epoch with probability 1.
We propose a model-free algorithm that converts PCMDP problem to an unconstrained problem and a Q-learning based approach is applied.
arXiv Detail & Related papers (2020-03-11T23:23:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.