The Dynamics of Gradient Descent for Overparametrized Neural Networks
- URL: http://arxiv.org/abs/2105.06569v1
- Date: Thu, 13 May 2021 22:20:30 GMT
- Title: The Dynamics of Gradient Descent for Overparametrized Neural Networks
- Authors: Siddhartha Satpathi and R Srikant
- Abstract summary: We show that the dynamics of neural network weights under GD converge to a point which is close to the minimum norm solution.
To illustrate the application of this result, we show that the GD converges to a gradient function that generalizes well.
- Score: 19.11271777632797
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the dynamics of gradient descent (GD) in overparameterized single
hidden layer neural networks with a squared loss function. Recently, it has
been shown that, under some conditions, the parameter values obtained using GD
achieve zero training error and generalize well if the initial conditions are
chosen appropriately. Here, through a Lyapunov analysis, we show that the
dynamics of neural network weights under GD converge to a point which is close
to the minimum norm solution subject to the condition that there is no training
error when using the linear approximation to the neural network. To illustrate
the application of this result, we show that the GD converges to a prediction
function that generalizes well, thereby providing an alternative proof of the
generalization results in Arora et al. (2019).
Related papers
- Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks [3.680127959836384]
implicit gradient descent (IGD) outperforms the common gradient descent (GD) in handling certain multi-scale problems.
We show that IGD converges a globally optimal solution at a linear convergence rate.
arXiv Detail & Related papers (2024-07-03T06:10:41Z) - Convergence Analysis for Learning Orthonormal Deep Linear Neural
Networks [27.29463801531576]
We provide convergence analysis for training orthonormal deep linear neural networks.
Our results shed light on how increasing the number of hidden layers can impact the convergence speed.
arXiv Detail & Related papers (2023-11-24T18:46:54Z) - How many Neurons do we need? A refined Analysis for Shallow Networks
trained with Gradient Descent [0.0]
We analyze the generalization properties of two-layer neural networks in the neural tangent kernel regime.
We derive fast rates of convergence that are known to be minimax optimal in the framework of non-parametric regression.
arXiv Detail & Related papers (2023-09-14T22:10:28Z) - Fast Convergence in Learning Two-Layer Neural Networks with Separable
Data [37.908159361149835]
We study normalized gradient descent on two-layer neural nets.
We prove for exponentially-tailed losses that using normalized GD leads to linear rate of convergence of the training loss to the global optimum.
arXiv Detail & Related papers (2023-05-22T20:30:10Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Implicit Geometric Regularization for Learning Shapes [34.052738965233445]
We offer a new paradigm for computing high fidelity implicit neural representations directly from raw data.
We show that our method leads to state of the art implicit neural representations with higher level-of-details and fidelity compared to previous methods.
arXiv Detail & Related papers (2020-02-24T07:36:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.