Two-Step Q-Learning
- URL: http://arxiv.org/abs/2407.02369v1
- Date: Tue, 2 Jul 2024 15:39:00 GMT
- Title: Two-Step Q-Learning
- Authors: Antony Vijesh, Shreyas S R,
- Abstract summary: The paper proposes a novel off-policy two-step Q-learning algorithm, without importance sampling.
Numerical experiments demonstrate the superior performance of both the two-step Q-learning and its smooth variants.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Q-learning is a stochastic approximation version of the classic value iteration. The literature has established that Q-learning suffers from both maximization bias and slower convergence. Recently, multi-step algorithms have shown practical advantages over existing methods. This paper proposes a novel off-policy two-step Q-learning algorithms, without importance sampling. With suitable assumption it was shown that, iterates in the proposed two-step Q-learning is bounded and converges almost surely to the optimal Q-values. This study also address the convergence analysis of the smooth version of two-step Q-learning, i.e., by replacing max function with the log-sum-exp function. The proposed algorithms are robust and easy to implement. Finally, we test the proposed algorithms on benchmark problems such as the roulette problem, maximization bias problem, and randomly generated Markov decision processes and compare it with the existing methods available in literature. Numerical experiments demonstrate the superior performance of both the two-step Q-learning and its smooth variants.
Related papers
- Variance-Reduced Cascade Q-learning: Algorithms and Sample Complexity [3.4376560669160394]
We introduce and analyze a novel model-free algorithm called Variance-Reduced Cascade Q-learning (VRCQ)
VRCQ provides superior guarantees in the $ell_infty$-norm compared with the existing model-free approximation-type algorithms.
arXiv Detail & Related papers (2024-08-13T00:34:33Z) - Stability of Q-Learning Through Design and Optimism [0.0]
This paper is in part a tutorial on approximation and Q-learning.
It provides details regarding the INFORMS APS inaugural Applied Probability Trust Plenary Lecture, presented in Nancy France, June 2023.
The paper also presents new approaches to ensure stability and potentially accelerated convergence for these algorithms.
arXiv Detail & Related papers (2023-07-05T20:04:26Z) - Sufficient Exploration for Convex Q-learning [10.75319149461189]
This paper builds on the linear programming (LP) formulation of optimal control of Manne.
A primal version is called logistic Q-learning, and a dual variant is convex Q-learning.
It is shown that convex Q-learning is successful in cases where standard Q-learning diverges.
arXiv Detail & Related papers (2022-10-17T20:22:12Z) - Online Target Q-learning with Reverse Experience Replay: Efficiently
finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results.
We present novel methods Q-Rex and Q-RexDaRe.
We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z) - Q-Match: Iterative Shape Matching via Quantum Annealing [64.74942589569596]
Finding shape correspondences can be formulated as an NP-hard quadratic assignment problem (QAP)
This paper proposes Q-Match, a new iterative quantum method for QAPs inspired by the alpha-expansion algorithm.
Q-Match can be applied for shape matching problems iteratively, on a subset of well-chosen correspondences, allowing us to scale to real-world problems.
arXiv Detail & Related papers (2021-05-06T17:59:38Z) - Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning.
We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Convex Q-Learning, Part 1: Deterministic Optimal Control [5.685589351789462]
It is well known that the extension of Watkins' algorithm to general function approximation settings is challenging.
The paper begins with a brief survey of linear programming approaches to optimal control, leading to a particular over parameterization that lends itself to applications in reinforcement learning.
It is shown that in fact the algorithms are very different: while convex Q-learning solves a convex program that approximates the Bellman equation, theory for DQN is no stronger than for Watkins' algorithm with function approximation.
arXiv Detail & Related papers (2020-08-08T17:17:42Z) - Momentum Q-learning with Finite-Sample Convergence Guarantee [49.38471009162477]
This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee.
We establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling.
We demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
arXiv Detail & Related papers (2020-07-30T12:27:03Z) - Analysis of Q-learning with Adaptation and Momentum Restart for Gradient
Descent [47.3692506462581]
We first characterize the convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update.
To further improve the performance, we propose to incorporate the momentum restart scheme to Q-AMSGrad, resulting in the so-called Q-AMSGradR algorithm.
Our experiments on a linear quadratic regulator problem show that the two proposed Q-learning algorithms outperform the vanilla Q-learning with SGD updates.
arXiv Detail & Related papers (2020-07-15T01:11:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.