Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of
DNNs
- URL: http://arxiv.org/abs/2002.10801v3
- Date: Wed, 29 Jul 2020 13:30:54 GMT
- Title: Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of
DNNs
- Authors: Lei Huang, Jie Qin, Li Liu, Fan Zhu, Ling Shao
- Abstract summary: We extend conditioning analysis to deep neural networks (DNNs) in order to investigate their learning dynamics.
We show that batch normalization (BN) can stabilize the training, but sometimes result in the false impression of a local minimum.
We experimentally observe that BN can improve the layer-wise conditioning of the optimization problem.
- Score: 115.35745188028169
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conditioning analysis uncovers the landscape of an optimization objective by
exploring the spectrum of its curvature matrix. This has been well explored
theoretically for linear models. We extend this analysis to deep neural
networks (DNNs) in order to investigate their learning dynamics. To this end,
we propose layer-wise conditioning analysis, which explores the optimization
landscape with respect to each layer independently. Such an analysis is
theoretically supported under mild assumptions that approximately hold in
practice. Based on our analysis, we show that batch normalization (BN) can
stabilize the training, but sometimes result in the false impression of a local
minimum, which has detrimental effects on the learning. Besides, we
experimentally observe that BN can improve the layer-wise conditioning of the
optimization problem. Finally, we find that the last linear layer of a very
deep residual network displays ill-conditioned behavior. We solve this problem
by only adding one BN layer before the last linear layer, which achieves
improved performance over the original and pre-activation residual networks.
Related papers
- Taming Gradient Oversmoothing and Expansion in Graph Neural Networks [3.0764244780817283]
Oversmoothing has been claimed as a primary bottleneck for graph neural networks (GNNs)
We show the presence of $textitgradient oversmoothing$ preventing optimization during training.
We provide a simple yet effective normalization method to prevent the gradient expansion.
arXiv Detail & Related papers (2024-10-07T08:22:20Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Convergence Analysis for Learning Orthonormal Deep Linear Neural
Networks [27.29463801531576]
We provide convergence analysis for training orthonormal deep linear neural networks.
Our results shed light on how increasing the number of hidden layers can impact the convergence speed.
arXiv Detail & Related papers (2023-11-24T18:46:54Z) - Stabilizing RNN Gradients through Pre-training [3.335932527835653]
Theory of learning proposes to prevent the gradient from exponential growth with depth or time, to stabilize and improve training.
We extend known stability theories to encompass a broader family of deep recurrent networks, requiring minimal assumptions on data and parameter distribution.
We propose a new approach to mitigate this issue, that consists on giving a weight of a half to the time and depth contributions to the gradient.
arXiv Detail & Related papers (2023-08-23T11:48:35Z) - No Wrong Turns: The Simple Geometry Of Neural Networks Optimization
Paths [12.068608358926317]
First-order optimization algorithms are known to efficiently locate favorable minima in deep neural networks.
We focus on the fundamental geometric properties of sampled quantities of optimization on two key paths.
Our findings suggest that not only do optimization trajectories never encounter significant obstacles, but they also maintain stable dynamics during the majority of training.
arXiv Detail & Related papers (2023-06-20T22:10:40Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - What can linear interpolation of neural network loss landscapes tell us? [11.753360538833139]
Loss landscapes are notoriously difficult to visualize in a human-comprehensible fashion.
One common way to address this problem is to plot linear slices of the landscape.
arXiv Detail & Related papers (2021-06-30T11:54:04Z) - Kernel-Based Smoothness Analysis of Residual Networks [85.20737467304994]
Residual networks (ResNets) stand out among these powerful modern architectures.
In this paper, we show another distinction between the two models, namely, a tendency of ResNets to promote smoothers than gradients.
arXiv Detail & Related papers (2020-09-21T16:32:04Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.