Related papers: The Dynamics of Gradient Descent for Overparametrized Neural Networks

The Dynamics of Gradient Descent for Overparametrized Neural Networks

URL: http://arxiv.org/abs/2105.06569v1
Date: Thu, 13 May 2021 22:20:30 GMT
Title: The Dynamics of Gradient Descent for Overparametrized Neural Networks
Authors: Siddhartha Satpathi and R Srikant
Abstract summary: We show that the dynamics of neural network weights under GD converge to a point which is close to the minimum norm solution. To illustrate the application of this result, we show that the GD converges to a gradient function that generalizes well.
Score: 19.11271777632797
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the dynamics of gradient descent (GD) in overparameterized single hidden layer neural networks with a squared loss function. Recently, it has been shown that, under some conditions, the parameter values obtained using GD achieve zero training error and generalize well if the initial conditions are chosen appropriately. Here, through a Lyapunov analysis, we show that the dynamics of neural network weights under GD converge to a point which is close to the minimum norm solution subject to the condition that there is no training error when using the linear approximation to the neural network. To illustrate the application of this result, we show that the GD converges to a prediction function that generalizes well, thereby providing an alternative proof of the generalization results in Arora et al. (2019).

Related papers

A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models [6.734175048463699]
We derive a linear convergence rate for gradient descent for two-layer linear neural networks trained with squared loss.<n>Our convergence analysis not only improves upon prior results but also suggests a better choice for the step size.
arXiv Detail & Related papers (2025-05-16T19:57:22Z)
Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks [3.680127959836384]
implicit gradient descent (IGD) outperforms the common gradient descent (GD) in handling certain multi-scale problems. We show that IGD converges a globally optimal solution at a linear convergence rate.
arXiv Detail & Related papers (2024-07-03T06:10:41Z)
Convergence Analysis for Learning Orthonormal Deep Linear Neural Networks [27.29463801531576]
We provide convergence analysis for training orthonormal deep linear neural networks. Our results shed light on how increasing the number of hidden layers can impact the convergence speed.
arXiv Detail & Related papers (2023-11-24T18:46:54Z)
How many Neurons do we need? A refined Analysis for Shallow Networks trained with Gradient Descent [0.0]
We analyze the generalization properties of two-layer neural networks in the neural tangent kernel regime. We derive fast rates of convergence that are known to be minimax optimal in the framework of non-parametric regression.
arXiv Detail & Related papers (2023-09-14T22:10:28Z)
Fast Convergence in Learning Two-Layer Neural Networks with Separable Data [37.908159361149835]
We study normalized gradient descent on two-layer neural nets. We prove for exponentially-tailed losses that using normalized GD leads to linear rate of convergence of the training loss to the global optimum.
arXiv Detail & Related papers (2023-05-22T20:30:10Z)
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems. PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features. In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z)
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons. Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z)
Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent. We show that SGD is biased towards a simple solution. We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z)
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs) In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit. We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
Implicit Geometric Regularization for Learning Shapes [34.052738965233445]
We offer a new paradigm for computing high fidelity implicit neural representations directly from raw data. We show that our method leads to state of the art implicit neural representations with higher level-of-details and fidelity compared to previous methods.
arXiv Detail & Related papers (2020-02-24T07:36:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.