Related papers: Finite-Time Analysis of Asynchronous Q-Learning with Discrete-Time Switching System Models

Finite-Time Analysis of Asynchronous Q-Learning with Discrete-Time Switching System Models

URL: http://arxiv.org/abs/2102.08583v2
Date: Fri, 19 Feb 2021 15:52:19 GMT
Title: Finite-Time Analysis of Asynchronous Q-Learning with Discrete-Time Switching System Models
Authors: Donghwan Lee
Abstract summary: We prove that Q-learning with a constant step-size can be naturally formulated as discrete-time switched linear systems. It offers novel and intuitive insights on Q-learning mainly based on controloretic frameworks.
Score: 6.85316573653194
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper develops a novel framework to analyze the convergence of Q-learning algorithm from a discrete-time switching system perspective. We prove that asynchronous Q-learning with a constant step-size can be naturally formulated as discrete-time stochastic switched linear systems. It offers novel and intuitive insights on Q-learning mainly based on control theoretic frameworks. For instance, the proposed analysis explains the overestimation phenomenon in Q-learning due to the maximization bias. Based on the control system theoretic argument and some nice structures of Q-learning, a new finite-time analysis of the Q-learning is given with a novel error bound.

Related papers

Unified ODE Analysis of Smooth Q-Learning Algorithms [5.152147416671501]
Recently, an convergence analysis for Q-learning was introduced using a switching system framework. We present a more general and unified convergence analysis that improves upon the switching system approach.
arXiv Detail & Related papers (2024-04-20T01:16:27Z)
Finite-Time Error Analysis of Soft Q-Learning: Switching System Approach [4.36117236405564]
Soft Q-learning is a variation of Q-learning designed to solve entropy regularized Markov decision problems. This paper aims to offer a novel and unified finite-time, control-theoretic analysis of soft Q-learning algorithms.
arXiv Detail & Related papers (2024-03-11T01:36:37Z)
Unifying (Quantum) Statistical and Parametrized (Quantum) Algorithms [65.268245109828]
We take inspiration from Kearns' SQ oracle and Valiant's weak evaluation oracle. We introduce an extensive yet intuitive framework that yields unconditional lower bounds for learning from evaluation queries.
arXiv Detail & Related papers (2023-10-26T18:23:21Z)
Finite-Time Analysis of Minimax Q-Learning for Two-Player Zero-Sum Markov Games: Switching System Approach [3.5823366350053325]
This paper investigates the finite-time analysis of a Q-learning algorithm applied to two-player zero-sum Markov games. We employ the switching system model of minimax Q-learning and the associated value iteration.
arXiv Detail & Related papers (2023-06-09T06:39:37Z)
An Analysis of Quantile Temporal-Difference Learning [53.36758478669685]
quantile temporal-difference learning (QTD) has proven to be a key component in several successful large-scale applications of reinforcement learning. Unlike classical TD learning, QTD updates do not approximate contraction mappings, are highly non-linear, and may have multiple fixed points. This paper is a proof of convergence to the fixed points of a related family of dynamic programming procedures with probability 1.
arXiv Detail & Related papers (2023-01-11T13:41:56Z)
Sufficient Exploration for Convex Q-learning [10.75319149461189]
This paper builds on the linear programming (LP) formulation of optimal control of Manne. A primal version is called logistic Q-learning, and a dual variant is convex Q-learning. It is shown that convex Q-learning is successful in cases where standard Q-learning diverges.
arXiv Detail & Related papers (2022-10-17T20:22:12Z)
Finite-Time Analysis of Asynchronous Q-learning under Diminishing Step-Size from Control-Theoretic View [3.5823366350053325]
This paper investigates new finite-time analysis of asynchronous Q-learning under Markovian observation models. In particular, we introduce a discrete-time time-varying switching system model of Q-learning with diminishing step-sizes. The proposed analysis brings in additional insights, covers different scenarios, and provides new simplified templates for analysis.
arXiv Detail & Related papers (2022-07-25T14:15:55Z)
Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results. We present novel methods Q-Rex and Q-RexDaRe. We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z)
Supervised DKRC with Images for Offline System Identification [77.34726150561087]
Modern dynamical systems are becoming increasingly non-linear and complex. There is a need for a framework to model these systems in a compact and comprehensive representation for prediction and control. Our approach learns these basis functions using a supervised learning approach.
arXiv Detail & Related papers (2021-09-06T04:39:06Z)
Momentum Q-learning with Finite-Sample Convergence Guarantee [49.38471009162477]
This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee. We establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling. We demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
arXiv Detail & Related papers (2020-07-30T12:27:03Z)
Analysis of Q-learning with Adaptation and Momentum Restart for Gradient Descent [47.3692506462581]
We first characterize the convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update. To further improve the performance, we propose to incorporate the momentum restart scheme to Q-AMSGrad, resulting in the so-called Q-AMSGradR algorithm. Our experiments on a linear quadratic regulator problem show that the two proposed Q-learning algorithms outperform the vanilla Q-learning with SGD updates.
arXiv Detail & Related papers (2020-07-15T01:11:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.