Convergence of stochastic gradient descent under a local Lojasiewicz
condition for deep neural networks
- URL: http://arxiv.org/abs/2304.09221v2
- Date: Fri, 12 Jan 2024 23:41:44 GMT
- Title: Convergence of stochastic gradient descent under a local Lojasiewicz
condition for deep neural networks
- Authors: Jing An and Jianfeng Lu
- Abstract summary: We establish the convergence of the local local convergence with positive gradient.
We provide examples of neural networks with finite widths such that our assumptions hold.
- Score: 7.9626223030099545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the convergence of stochastic gradient descent (SGD) for non-convex
objective functions. We establish the local convergence with positive
probability under the local \L{}ojasiewicz condition introduced by Chatterjee
in \cite{chatterjee2022convergence} and an additional local structural
assumption of the loss function landscape. A key component of our proof is to
ensure that the whole trajectories of SGD stay inside the local region with a
positive probability. We also provide examples of neural networks with finite
widths such that our assumptions hold.
Related papers
- Numerically assisted determination of local models in network scenarios [55.2480439325792]
We develop a numerical tool for finding explicit local models that reproduce a given statistical behaviour.
We provide conjectures for the critical visibilities of the Greenberger-Horne-Zeilinger (GHZ) and W distributions.
The developed codes and documentation are publicly available at281.com/mariofilho/localmodels.
arXiv Detail & Related papers (2023-03-17T13:24:04Z) - From Gradient Flow on Population Loss to Learning with Stochastic
Gradient Descent [50.4531316289086]
Gradient Descent (SGD) has been the method of choice for learning large-scale non-root models.
An overarching paper is providing general conditions SGD converges, assuming that GF on the population loss converges.
We provide a unified analysis for GD/SGD not only for classical settings like convex losses, but also for more complex problems including Retrieval Matrix sq-root.
arXiv Detail & Related papers (2022-10-13T03:55:04Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Local Stochastic Factored Gradient Descent for Distributed Quantum State
Tomography [10.623470454359431]
Local Factored Gradient Descent (Local SFGD)
Quantum State Tomography (QST) protocol.
Local SFGD converges locally to a small neighborhood of the global at a linear rate with a constant step size.
arXiv Detail & Related papers (2022-03-22T10:03:16Z) - Robust Estimation for Nonparametric Families via Generative Adversarial
Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems.
Our work extend these to robust mean estimation, second moment estimation, and robust linear regression.
In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z) - On generalization bounds for deep networks based on loss surface
implicit regularization [5.68558935178946]
Modern deep neural networks generalize well despite a large number of parameters.
That modern deep neural networks generalize well despite a large number of parameters contradicts the classical statistical learning theory.
arXiv Detail & Related papers (2022-01-12T16:41:34Z) - Convergence of stochastic gradient descent schemes for
Lojasiewicz-landscapes [0.0]
We consider convergence of gradient descent schemes under weak assumptions on the underlying landscape.
In particular, we show that for neural networks with analytic activation function such as softplus, sigmoid and the hyperbolic tangent, SGD converges on the event of staying bounded.
arXiv Detail & Related papers (2021-02-16T12:42:25Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.