Momentum Q-learning with Finite-Sample Convergence Guarantee
- URL: http://arxiv.org/abs/2007.15418v1
- Date: Thu, 30 Jul 2020 12:27:03 GMT
- Title: Momentum Q-learning with Finite-Sample Convergence Guarantee
- Authors: Bowen Weng, Huaqing Xiong, Lin Zhao, Yingbin Liang, Wei Zhang
- Abstract summary: This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee.
We establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling.
We demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
- Score: 49.38471009162477
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing studies indicate that momentum ideas in conventional optimization
can be used to improve the performance of Q-learning algorithms. However, the
finite-sample analysis for momentum-based Q-learning algorithms is only
available for the tabular case without function approximations. This paper
analyzes a class of momentum-based Q-learning algorithms with finite-sample
guarantee. Specifically, we propose the MomentumQ algorithm, which integrates
the Nesterov's and Polyak's momentum schemes, and generalizes the existing
momentum-based Q-learning algorithms. For the infinite state-action space case,
we establish the convergence guarantee for MomentumQ with linear function
approximations and Markovian sampling. In particular, we characterize the
finite-sample convergence rate which is provably faster than the vanilla
Q-learning. This is the first finite-sample analysis for momentum-based
Q-learning algorithms with function approximations. For the tabular case under
synchronous sampling, we also obtain a finite-sample convergence rate that is
slightly better than the SpeedyQ \citep{azar2011speedy} when choosing a special
family of step sizes. Finally, we demonstrate through various experiments that
the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
Related papers
- Application of Langevin Dynamics to Advance the Quantum Natural Gradient Optimization Algorithm [47.47843839099175]
A Quantum Natural Gradient (QNG) algorithm for optimization of variational quantum circuits has been proposed recently.
In this study, we employ the Langevin equation with a QNG force to demonstrate that its discrete-time solution gives a generalized form, which we call Momentum-QNG.
arXiv Detail & Related papers (2024-09-03T15:21:16Z) - Two-Step Q-Learning [0.0]
The paper proposes a novel off-policy two-step Q-learning algorithm, without importance sampling.
Numerical experiments demonstrate the superior performance of both the two-step Q-learning and its smooth variants.
arXiv Detail & Related papers (2024-07-02T15:39:00Z) - Pointer Networks with Q-Learning for Combinatorial Optimization [55.2480439325792]
We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets)
Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.
arXiv Detail & Related papers (2023-11-05T12:03:58Z) - Unifying (Quantum) Statistical and Parametrized (Quantum) Algorithms [65.268245109828]
We take inspiration from Kearns' SQ oracle and Valiant's weak evaluation oracle.
We introduce an extensive yet intuitive framework that yields unconditional lower bounds for learning from evaluation queries.
arXiv Detail & Related papers (2023-10-26T18:23:21Z) - PAPAL: A Provable PArticle-based Primal-Dual ALgorithm for Mixed Nash
Equilibrium [62.51015395213579]
We consider the non-AL equilibrium nonconptotic objective function in two-player zero-sum continuous games.
The proposed algorithm employs the movements of particles to represent the updates of random strategies for the $ilon$-mixed Nash equilibrium.
arXiv Detail & Related papers (2023-03-02T05:08:15Z) - A Sublinear-Time Quantum Algorithm for Approximating Partition Functions [0.0]
We present a novel quantum algorithm for estimating Gibbs partition functions in sublinear time.
This is the first speed-up of this type to be obtained over the seminal nearly-linear time of vStefankovivc, Vempala and Vigoda.
arXiv Detail & Related papers (2022-07-18T14:41:48Z) - Finite Horizon Q-learning: Stability, Convergence and Simulations [0.0]
We develop a version of Q-learning algorithm for finite horizon Markov decision processes (MDP)
Our analysis of stability and convergence of finite horizon Q-learning is based entirely on the ordinary differential equations (O.D.E) method.
arXiv Detail & Related papers (2021-10-27T16:18:44Z) - Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning.
We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z) - K-spin Hamiltonian for quantum-resolvable Markov decision processes [0.0]
We derive a pseudo-Boolean cost function that is equivalent to a K-spin Hamiltonian representation of the discrete, finite, discounted Markov decision process.
This K-spin Hamiltonian furnishes a starting point from which to solve for an optimal policy using quantum algorithms.
arXiv Detail & Related papers (2020-04-13T16:15:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.