Related papers: Convergence of stochastic gradient descent schemes for Lojasiewicz-landscapes

Convergence of stochastic gradient descent schemes for Lojasiewicz-landscapes

URL: http://arxiv.org/abs/2102.09385v3
Date: Tue, 9 Jan 2024 16:01:17 GMT
Title: Convergence of stochastic gradient descent schemes for Lojasiewicz-landscapes
Authors: Steffen Dereich and Sebastian Kassing
Abstract summary: We consider convergence of gradient descent schemes under weak assumptions on the underlying landscape. In particular, we show that for neural networks with analytic activation function such as softplus, sigmoid and the hyperbolic tangent, SGD converges on the event of staying bounded.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this article, we consider convergence of stochastic gradient descent schemes (SGD), including momentum stochastic gradient descent (MSGD), under weak assumptions on the underlying landscape. More explicitly, we show that on the event that the SGD stays bounded we have convergence of the SGD if there is only a countable number of critical points or if the objective function satisfies Lojasiewicz-inequalities around all critical levels as all analytic functions do. In particular, we show that for neural networks with analytic activation function such as softplus, sigmoid and the hyperbolic tangent, SGD converges on the event of staying bounded, if the random variables modelling the signal and response in the training are compactly supported.

Related papers

Convergence of SGD with momentum in the nonconvex case: A time window-based analysis [2.048226951354646]
The gradient exponent descent method with momentum (SGDM) is a common approach for solving large-scale optimization problems. Despite its popularity, SGDM remains less understood in non convergence scenarios.
arXiv Detail & Related papers (2024-05-27T08:46:28Z)
Demystifying the Myths and Legends of Nonconvex Convergence of SGD [17.445810977264067]
gradient descent (SGD) and its variants are the main workhorses for solving large-scale optimization problems. As our analyses, we addressed certain myths and legends related to the non convergence of the gradient.
arXiv Detail & Related papers (2023-10-19T17:58:59Z)
Curvature-Independent Last-Iterate Convergence for Games on Riemannian Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate. To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z)
Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks [7.9626223030099545]
We establish the convergence of the local local convergence with positive gradient. We provide examples of neural networks with finite widths such that our assumptions hold.
arXiv Detail & Related papers (2023-04-18T18:20:52Z)
From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent [50.4531316289086]
Gradient Descent (SGD) has been the method of choice for learning large-scale non-root models. An overarching paper is providing general conditions SGD converges, assuming that GF on the population loss converges. We provide a unified analysis for GD/SGD not only for classical settings like convex losses, but also for more complex problems including Retrieval Matrix sq-root.
arXiv Detail & Related papers (2022-10-13T03:55:04Z)
Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z)
On the Convergence of mSGD and AdaGrad for Stochastic Optimization [0.696125353550498]
convex descent (SGD) has been intensively developed and extensively applied in machine learning in the past decade. Some modified SGD-type algorithms, which outperform the SGD in many competitions and applications in terms of convergence rate and accuracy, such as momentum-based SGD (mSGD) and adaptive gradient optimization (AdaGrad) We focus on convergence analysis of mSGD and AdaGrad for any smooth (possibly non-possibly non-possibly non-possibly) loss functions in machine learning.
arXiv Detail & Related papers (2022-01-26T22:02:21Z)
Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm. We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z)
Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size. Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z)
An improved convergence analysis for decentralized online stochastic non-convex optimization [17.386715847732468]
In this paper, we show that a technique called GT-Loakjasiewics (GT-Loakjasiewics) satisfies the existing condition GT-Loakjasiewics (GT-Loakjasiewics) satisfies the current best convergence rates. The results are not only immediately applicable but also the currently known best convergence rates.
arXiv Detail & Related papers (2020-08-10T15:29:13Z)
Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate. We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z)
Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates. This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting. To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.