Related papers: Finite Horizon Q-learning: Stability, Convergence and Simulations

Finite Horizon Q-learning: Stability, Convergence and Simulations

URL: http://arxiv.org/abs/2110.15093v1
Date: Wed, 27 Oct 2021 16:18:44 GMT
Title: Finite Horizon Q-learning: Stability, Convergence and Simulations
Authors: Vivek VP, Dr.Shalabh Bhatnagar
Abstract summary: We develop a version of Q-learning algorithm for finite horizon Markov decision processes (MDP) Our analysis of stability and convergence of finite horizon Q-learning is based entirely on the ordinary differential equations (O.D.E) method.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Q-learning is a popular reinforcement learning algorithm. This algorithm has however been studied and analysed mainly in the infinite horizon setting. There are several important applications which can be modeled in the framework of finite horizon Markov decision processes. We develop a version of Q-learning algorithm for finite horizon Markov decision processes (MDP) and provide a full proof of its stability and convergence. Our analysis of stability and convergence of finite horizon Q-learning is based entirely on the ordinary differential equations (O.D.E) method. We also demonstrate the performance of our algorithm on a setting of random MDP.

Related papers

Uncertainty quantification for Markov chains with application to temporal difference learning [63.49764856675643]
We develop novel high-dimensional concentration inequalities and Berry-Esseen bounds for vector- and matrix-valued functions of Markov chains. We analyze the TD learning algorithm, a widely used method for policy evaluation in reinforcement learning.
arXiv Detail & Related papers (2025-02-19T15:33:55Z)
Two-Step Q-Learning [0.0]
The paper proposes a novel off-policy two-step Q-learning algorithm, without importance sampling. Numerical experiments demonstrate the superior performance of both the two-step Q-learning and its smooth variants.
arXiv Detail & Related papers (2024-07-02T15:39:00Z)
Unified ODE Analysis of Smooth Q-Learning Algorithms [5.152147416671501]
Recently, an convergence analysis for Q-learning was introduced using a switching system framework. We present a more general and unified convergence analysis that improves upon the switching system approach.
arXiv Detail & Related papers (2024-04-20T01:16:27Z)
Finite-Time Error Analysis of Soft Q-Learning: Switching System Approach [4.36117236405564]
Soft Q-learning is a variation of Q-learning designed to solve entropy regularized Markov decision problems. This paper aims to offer a novel and unified finite-time, control-theoretic analysis of soft Q-learning algorithms.
arXiv Detail & Related papers (2024-03-11T01:36:37Z)
Unifying (Quantum) Statistical and Parametrized (Quantum) Algorithms [65.268245109828]
We take inspiration from Kearns' SQ oracle and Valiant's weak evaluation oracle. We introduce an extensive yet intuitive framework that yields unconditional lower bounds for learning from evaluation queries.
arXiv Detail & Related papers (2023-10-26T18:23:21Z)
Stability and Generalization of Stochastic Compositional Gradient Descent Algorithms [61.59448949684493]
We provide the stability and generalization analysis of compositional descent algorithms built from training examples. We establish the uniform stability results for two popular compositional gradient descent algorithms, namely SCGD and SCSC. We derive-independent excess risk bounds for SCGD and SCSC by trade-offing their stability results and optimization errors.
arXiv Detail & Related papers (2023-07-07T02:40:09Z)
An Analysis of Quantile Temporal-Difference Learning [53.36758478669685]
quantile temporal-difference learning (QTD) has proven to be a key component in several successful large-scale applications of reinforcement learning. Unlike classical TD learning, QTD updates do not approximate contraction mappings, are highly non-linear, and may have multiple fixed points. This paper is a proof of convergence to the fixed points of a related family of dynamic programming procedures with probability 1.
arXiv Detail & Related papers (2023-01-11T13:41:56Z)
Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning [59.71676469100807]
This work sharpens the sample complexity of synchronous Q-learning to an order of $frac|mathcalS|| (1-gamma)4varepsilon2$ for any $0varepsilon 1$. Our finding unveils the effectiveness of vanilla Q-learning, which matches that of speedy Q-learning without requiring extra computation and storage.
arXiv Detail & Related papers (2021-02-12T14:22:05Z)
Hamilton-Jacobi Deep Q-Learning for Deterministic Continuous-Time Systems with Lipschitz Continuous Controls [2.922007656878633]
We propose Q-learning algorithms for continuous-time deterministic optimal control problems with Lipschitz continuous controls. A novel semi-discrete version of the HJB equation is proposed to design a Q-learning algorithm that uses data collected in discrete time without discretizing or approximating the system dynamics.
arXiv Detail & Related papers (2020-10-27T06:11:04Z)
Momentum Q-learning with Finite-Sample Convergence Guarantee [49.38471009162477]
This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee. We establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling. We demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
arXiv Detail & Related papers (2020-07-30T12:27:03Z)
A Dynamical Systems Approach for Convergence of the Bayesian EM Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based. The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM) We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.