Related papers: Gauss-Newton Temporal Difference Learning with Nonlinear Function Approximation

Gauss-Newton Temporal Difference Learning with Nonlinear Function Approximation

URL: http://arxiv.org/abs/2302.13087v2
Date: Mon, 1 Apr 2024 02:57:46 GMT
Title: Gauss-Newton Temporal Difference Learning with Nonlinear Function Approximation
Authors: Zhifa Ke, Junyu Zhang, Zaiwen Wen,
Abstract summary: A Gauss-Newton Temporal Difference (GNTD) learning method is proposed to solve the Q-learning problem with nonlinear function approximation. In each iteration, our method takes one Gauss-Newton (GN) step to optimize a variant of Mean-Squared Bellman Error (MSBE) We validate our method via extensive experiments in several RL benchmarks, where GNTD exhibits both higher rewards and faster convergence than TD-type methods.
Score: 11.925232472331494
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, a Gauss-Newton Temporal Difference (GNTD) learning method is proposed to solve the Q-learning problem with nonlinear function approximation. In each iteration, our method takes one Gauss-Newton (GN) step to optimize a variant of Mean-Squared Bellman Error (MSBE), where target networks are adopted to avoid double sampling. Inexact GN steps are analyzed so that one can safely and efficiently compute the GN updates by cheap matrix iterations. Under mild conditions, non-asymptotic finite-sample convergence to the globally optimal Q function is derived for various nonlinear function approximations. In particular, for neural network parameterization with relu activation, GNTD achieves an improved sample complexity of $\tilde{\mathcal{O}}(\varepsilon^{-1})$, as opposed to the $\mathcal{\mathcal{O}}(\varepsilon^{-2})$ sample complexity of the existing neural TD methods. An $\tilde{\mathcal{O}}(\varepsilon^{-1.5})$ sample complexity of GNTD is also established for general smooth function approximations. We validate our method via extensive experiments in several RL benchmarks, where GNTD exhibits both higher rewards and faster convergence than TD-type methods.

Related papers

Convergence Analysis of Natural Gradient Descent for Over-parameterized Physics-Informed Neural Networks [4.554284689395686]
We show that for training two-layer $textReLU3$ Physics-Informed Neural Networks (PINNs), the learning rate can be improved from $mathcalO(lambda_0)$ to $mathcalO (1/|bmHinfty|_2)$.<n>Despite such improvements, the convergence rate is still tied to the least eigenvalue of the Gram matrix, leading to slow convergence.
arXiv Detail & Related papers (2024-08-01T14:06:34Z)
MGDA Converges under Generalized Smoothness, Provably [27.87166415148172]
Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning.<n>Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard $L$-smooth or bounded-gradient assumptions.<n>We study a more general and realistic class of generalized $ell$-smooth loss functions, where $ell$ is a general non-decreasing function of gradient norm.
arXiv Detail & Related papers (2024-05-29T18:36:59Z)
An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks [11.925232472331494]
We develop an improved non-asymptotic analysis of the neural TD method with a general $L$-layer neural network. New proof techniques are developed and an improved new $tildemathcalO(epsilon-1)$ sample complexity is derived.
arXiv Detail & Related papers (2024-05-07T05:29:55Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks. In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z)
Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties. We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z)
Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time [37.73689342377357]
It is still an open question whether gradient descent on networks without unnatural modifications can achieve better sample complexity than kernel methods. We show that projected gradient descent with a positive learning number converges to low error with the same sample.
arXiv Detail & Related papers (2023-06-28T16:45:38Z)
On Convergence of Incremental Gradient for Non-Convex Smooth Functions [63.51187646914962]
In machine learning and network optimization, algorithms like shuffle SGD are popular due to minimizing the number of misses and good cache. This paper delves into the convergence properties SGD algorithms with arbitrary data ordering.
arXiv Detail & Related papers (2023-05-30T17:47:27Z)
Gradient-Free Methods for Deterministic and Stochastic Nonsmooth Nonconvex Optimization [94.19177623349947]
Non-smooth non optimization problems emerge in machine learning and business making. Two core challenges impede the development of efficient methods with finitetime convergence guarantee. Two-phase versions of GFM and SGFM are also proposed and proven to achieve improved large-deviation results.
arXiv Detail & Related papers (2022-09-12T06:53:24Z)
Stochastic Zeroth order Descent with Structured Directions [10.604744518360464]
We introduce and analyze Structured Zeroth order Descent (SSZD), a finite difference approach that approximates a gradient on a set $lleq d directions, where $d is the dimension of the ambient space. For convex convex we prove almost sure convergence of functions on $O( (d/l) k-c1/2$)$ for every $c1/2$, which is arbitrarily close to the one of the Gradient Descent (SGD) in terms of one number of iterations.
arXiv Detail & Related papers (2022-06-10T14:00:06Z)
Sample Complexity Bounds for Two Timescale Value-based Reinforcement Learning Algorithms [65.09383385484007]
Two timescale approximation (SA) has been widely used in value-based reinforcement learning algorithms. We study the non-asymptotic convergence rate of two timescale linear and nonlinear TDC and Greedy-GQ algorithms.
arXiv Detail & Related papers (2020-11-10T11:36:30Z)
Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions. In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.