Finite-Time Analysis of Asynchronous Q-Learning with Discrete-Time
Switching System Models
- URL: http://arxiv.org/abs/2102.08583v2
- Date: Fri, 19 Feb 2021 15:52:19 GMT
- Title: Finite-Time Analysis of Asynchronous Q-Learning with Discrete-Time
Switching System Models
- Authors: Donghwan Lee
- Abstract summary: We prove that Q-learning with a constant step-size can be naturally formulated as discrete-time switched linear systems.
It offers novel and intuitive insights on Q-learning mainly based on controloretic frameworks.
- Score: 6.85316573653194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper develops a novel framework to analyze the convergence of
Q-learning algorithm from a discrete-time switching system perspective. We
prove that asynchronous Q-learning with a constant step-size can be naturally
formulated as discrete-time stochastic switched linear systems. It offers novel
and intuitive insights on Q-learning mainly based on control theoretic
frameworks. For instance, the proposed analysis explains the overestimation
phenomenon in Q-learning due to the maximization bias. Based on the control
system theoretic argument and some nice structures of Q-learning, a new
finite-time analysis of the Q-learning is given with a novel error bound.
Related papers
- Unified ODE Analysis of Smooth Q-Learning Algorithms [5.152147416671501]
Recently, an convergence analysis for Q-learning was introduced using a switching system framework.
We present a more general and unified convergence analysis that improves upon the switching system approach.
arXiv Detail & Related papers (2024-04-20T01:16:27Z) - Finite-Time Error Analysis of Soft Q-Learning: Switching System Approach [4.36117236405564]
Soft Q-learning is a variation of Q-learning designed to solve entropy regularized Markov decision problems.
This paper aims to offer a novel and unified finite-time, control-theoretic analysis of soft Q-learning algorithms.
arXiv Detail & Related papers (2024-03-11T01:36:37Z) - Unifying (Quantum) Statistical and Parametrized (Quantum) Algorithms [65.268245109828]
We take inspiration from Kearns' SQ oracle and Valiant's weak evaluation oracle.
We introduce an extensive yet intuitive framework that yields unconditional lower bounds for learning from evaluation queries.
arXiv Detail & Related papers (2023-10-26T18:23:21Z) - Finite-Time Analysis of Minimax Q-Learning for Two-Player Zero-Sum
Markov Games: Switching System Approach [3.5823366350053325]
This paper investigates the finite-time analysis of a Q-learning algorithm applied to two-player zero-sum Markov games.
We employ the switching system model of minimax Q-learning and the associated value iteration.
arXiv Detail & Related papers (2023-06-09T06:39:37Z) - An Analysis of Quantile Temporal-Difference Learning [53.36758478669685]
quantile temporal-difference learning (QTD) has proven to be a key component in several successful large-scale applications of reinforcement learning.
Unlike classical TD learning, QTD updates do not approximate contraction mappings, are highly non-linear, and may have multiple fixed points.
This paper is a proof of convergence to the fixed points of a related family of dynamic programming procedures with probability 1.
arXiv Detail & Related papers (2023-01-11T13:41:56Z) - Sufficient Exploration for Convex Q-learning [10.75319149461189]
This paper builds on the linear programming (LP) formulation of optimal control of Manne.
A primal version is called logistic Q-learning, and a dual variant is convex Q-learning.
It is shown that convex Q-learning is successful in cases where standard Q-learning diverges.
arXiv Detail & Related papers (2022-10-17T20:22:12Z) - Finite-Time Analysis of Asynchronous Q-learning under Diminishing
Step-Size from Control-Theoretic View [3.5823366350053325]
This paper investigates new finite-time analysis of asynchronous Q-learning under Markovian observation models.
In particular, we introduce a discrete-time time-varying switching system model of Q-learning with diminishing step-sizes.
The proposed analysis brings in additional insights, covers different scenarios, and provides new simplified templates for analysis.
arXiv Detail & Related papers (2022-07-25T14:15:55Z) - Online Target Q-learning with Reverse Experience Replay: Efficiently
finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results.
We present novel methods Q-Rex and Q-RexDaRe.
We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z) - Supervised DKRC with Images for Offline System Identification [77.34726150561087]
Modern dynamical systems are becoming increasingly non-linear and complex.
There is a need for a framework to model these systems in a compact and comprehensive representation for prediction and control.
Our approach learns these basis functions using a supervised learning approach.
arXiv Detail & Related papers (2021-09-06T04:39:06Z) - Momentum Q-learning with Finite-Sample Convergence Guarantee [49.38471009162477]
This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee.
We establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling.
We demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
arXiv Detail & Related papers (2020-07-30T12:27:03Z) - Analysis of Q-learning with Adaptation and Momentum Restart for Gradient
Descent [47.3692506462581]
We first characterize the convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update.
To further improve the performance, we propose to incorporate the momentum restart scheme to Q-AMSGrad, resulting in the so-called Q-AMSGradR algorithm.
Our experiments on a linear quadratic regulator problem show that the two proposed Q-learning algorithms outperform the vanilla Q-learning with SGD updates.
arXiv Detail & Related papers (2020-07-15T01:11:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.