Related papers: Analysis of Q-learning with Adaptation and Momentum Restart for Gradient Descent

Analysis of Q-learning with Adaptation and Momentum Restart for Gradient Descent

URL: http://arxiv.org/abs/2007.07422v1
Date: Wed, 15 Jul 2020 01:11:43 GMT
Title: Analysis of Q-learning with Adaptation and Momentum Restart for Gradient Descent
Authors: Bowen Weng, Huaqing Xiong, Yingbin Liang, Wei Zhang
Abstract summary: We first characterize the convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update. To further improve the performance, we propose to incorporate the momentum restart scheme to Q-AMSGrad, resulting in the so-called Q-AMSGradR algorithm. Our experiments on a linear quadratic regulator problem show that the two proposed Q-learning algorithms outperform the vanilla Q-learning with SGD updates.
Score: 47.3692506462581
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing convergence analyses of Q-learning mostly focus on the vanilla stochastic gradient descent (SGD) type of updates. Despite the Adaptive Moment Estimation (Adam) has been commonly used for practical Q-learning algorithms, there has not been any convergence guarantee provided for Q-learning with such type of updates. In this paper, we first characterize the convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update (a commonly adopted alternative of Adam for theoretical analysis). To further improve the performance, we propose to incorporate the momentum restart scheme to Q-AMSGrad, resulting in the so-called Q-AMSGradR algorithm. The convergence rate of Q-AMSGradR is also established. Our experiments on a linear quadratic regulator problem show that the two proposed Q-learning algorithms outperform the vanilla Q-learning with SGD updates. The two algorithms also exhibit significantly better performance than the DQN learning method over a batch of Atari 2600 games.

Related papers

Two-Step Q-Learning [0.0]
The paper proposes a novel off-policy two-step Q-learning algorithm, without importance sampling. Numerical experiments demonstrate the superior performance of both the two-step Q-learning and its smooth variants.
arXiv Detail & Related papers (2024-07-02T15:39:00Z)
Sufficient Exploration for Convex Q-learning [10.75319149461189]
This paper builds on the linear programming (LP) formulation of optimal control of Manne. A primal version is called logistic Q-learning, and a dual variant is convex Q-learning. It is shown that convex Q-learning is successful in cases where standard Q-learning diverges.
arXiv Detail & Related papers (2022-10-17T20:22:12Z)
Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results. We present novel methods Q-Rex and Q-RexDaRe. We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z)
Self-correcting Q-Learning [14.178899938667161]
We introduce a new way to address the bias in the form of a "self-correcting algorithm" Applying this strategy to Q-learning results in Self-correcting Q-learning. We show theoretically that this new algorithm enjoys the same convergence guarantees as Q-learning while being more accurate.
arXiv Detail & Related papers (2020-12-02T11:36:24Z)
Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z)
Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods. Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z)
GenCos' Behaviors Modeling Based on Q Learning Improved by Dichotomy [3.14969586104215]
A novel Q learning algorithm is proposed in this paper. It modifies the update process of the Q table by dichotomizing the state space and the action space step by step. Simulation results in a repeated Cournot game show the effectiveness of the proposed algorithm.
arXiv Detail & Related papers (2020-08-04T13:48:09Z)
Momentum Q-learning with Finite-Sample Convergence Guarantee [49.38471009162477]
This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee. We establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling. We demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
arXiv Detail & Related papers (2020-07-30T12:27:03Z)
Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient [51.880464915253924]
Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance. This paper introduces the framework for updating the gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN.
arXiv Detail & Related papers (2020-07-25T00:54:20Z)
Q-Learning with Differential Entropy of Q-Tables [4.221871357181261]
We conjecture that the reduction in performance during prolonged training sessions of Q-learning is caused by a loss of information. We introduce Differential Entropy of Q-tables (DE-QT) as an external information loss detector to the Q-learning algorithm.
arXiv Detail & Related papers (2020-06-26T04:37:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.