Related papers: Quantum Algorithms for Reinforcement Learning with a Generative Model

Quantum Algorithms for Reinforcement Learning with a Generative Model

URL: http://arxiv.org/abs/2112.08451v1
Date: Wed, 15 Dec 2021 19:51:49 GMT
Title: Quantum Algorithms for Reinforcement Learning with a Generative Model
Authors: Daochen Wang, Aarthi Sundaram, Robin Kothari, Ashish Kapoor, Martin Roetteler
Abstract summary: Reinforcement learning studies how an agent should interact with an environment to maximize its cumulative reward. We design quantum algorithms that approximate an optimal policy ($pi*$), the optimal value function ($v*$), and the optimal $Q$-function ($q*$) We show that our quantum algorithm for computing $q*$ is optimal by proving a matching quantum lower bound.
Score: 16.168901236223117
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning studies how an agent should interact with an environment to maximize its cumulative reward. A standard way to study this question abstractly is to ask how many samples an agent needs from the environment to learn an optimal policy for a $\gamma$-discounted Markov decision process (MDP). For such an MDP, we design quantum algorithms that approximate an optimal policy ($\pi^*$), the optimal value function ($v^*$), and the optimal $Q$-function ($q^*$), assuming the algorithms can access samples from the environment in quantum superposition. This assumption is justified whenever there exists a simulator for the environment; for example, if the environment is a video game or some other program. Our quantum algorithms, inspired by value iteration, achieve quadratic speedups over the best-possible classical sample complexities in the approximation accuracy ($\epsilon$) and two main parameters of the MDP: the effective time horizon ($\frac{1}{1-\gamma}$) and the size of the action space ($A$). Moreover, we show that our quantum algorithm for computing $q^*$ is optimal by proving a matching quantum lower bound.

Related papers

Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Key to our solution is a novel projection technique based on ideas from harmonic analysis. Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z)
Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs [60.40452803295326]
We propose a new reward-free algorithm for learning linear Markov decision processes (MDPs) At the core of our algorithm is uncertainty-weighted value-targeted regression with exploration-driven pseudo-reward. We show that our algorithm only needs to explore $tilde O( d2varepsilon-2)$ episodes to find an $varepsilon$-optimal policy.
arXiv Detail & Related papers (2023-03-17T17:53:28Z)
A Quadratic Speedup in the Optimization of Noisy Quantum Optical Circuits [5.074606924176912]
Linear optical quantum circuits with photon number resolving (PNR) detectors are used for both Gaussian Boson Sampling (GBS) and for the preparation of non-Gaussian states such as Gottesman-Kitaev-Preskill (GKP) We introduce a family of algorithms that calculate detection probabilities, conditional states and their gradients with respect to circuit parametrizations. These algorithms are implemented and ready to use in the open-source photonic optimization library MrMustard.
arXiv Detail & Related papers (2023-03-15T18:51:36Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model. The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z)
The Quantum Approximate Optimization Algorithm performance with low entanglement and high circuit depth [0.0]
Variational quantum algorithms constitute one of the most widespread methods for using current noisy quantum computers. We investigate entanglement's role in these methods for solving optimization problems. We conclude that entanglement plays a minor role in the MaxCut and Exact Cover 3 problems studied here.
arXiv Detail & Related papers (2022-07-07T16:21:36Z)
Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes [61.11090361892306]
Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration. We show that this separation does not exist in the setting of linear MDPs. We develop a computationally efficient algorithm for reward-free RL in a $d$-dimensional linear MDP.
arXiv Detail & Related papers (2022-01-26T22:09:59Z)
Variational Quantum Algorithms for Semidefinite Programming [3.481985817302898]
A semidefinite program (SDP) is a convex optimization problem with applications in operations research, optimization, quantum information science, and beyond. We propose variational quantum algorithms for approximately solving SDPs.
arXiv Detail & Related papers (2021-12-16T13:10:48Z)
Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary. Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Operator Sampling for Shot-frugal Optimization in Variational Algorithms [0.860968116787044]
We introduce a strategy for reducing the number of measurements (i.e., shots) by randomly sampling operators $h_i$ from the overall $H = sum_i c_i h_i$. We implement this and other Samplings to find the ground states of molecules H$$, LiH, and BeH$, without and with quantum hardware noise, and Rosalin outperforms other Samplings in most cases.
arXiv Detail & Related papers (2020-04-14T01:00:07Z)
Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints [38.2783003051101]
This paper considers the peak Constrained Markov Decision Process (PCMDP), where the agent chooses the policy to maximize total reward in the finite horizon as well as satisfy constraints at each epoch with probability 1. We propose a model-free algorithm that converts PCMDP problem to an unconstrained problem and a Q-learning based approach is applied.
arXiv Detail & Related papers (2020-03-11T23:23:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.