Shallow Univariate ReLu Networks as Splines: Initialization, Loss
Surface, Hessian, & Gradient Flow Dynamics
- URL: http://arxiv.org/abs/2008.01772v1
- Date: Tue, 4 Aug 2020 19:19:49 GMT
- Title: Shallow Univariate ReLu Networks as Splines: Initialization, Loss
Surface, Hessian, & Gradient Flow Dynamics
- Authors: Justin Sahs, Ryan Pyle, Aneel Damaraju, Josue Ortega Caro, Onur
Tavaslioglu, Andy Lu, Ankit Patel
- Abstract summary: We propose reparametrizing ReLU NNs as continuous piecewise linear splines.
We develop a surprisingly simple and transparent view of the structure of the loss surface, including its critical and fixed points, Hessian, and Hessian spectrum.
Videos of learning dynamics using a spline-based visualization are available at http://shorturl.at/tFWZ2.
- Score: 1.5393457051344297
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the learning dynamics and inductive bias of neural networks
(NNs) is hindered by the opacity of the relationship between NN parameters and
the function represented. We propose reparametrizing ReLU NNs as continuous
piecewise linear splines. Using this spline lens, we study learning dynamics in
shallow univariate ReLU NNs, finding unexpected insights and explanations for
several perplexing phenomena. We develop a surprisingly simple and transparent
view of the structure of the loss surface, including its critical and fixed
points, Hessian, and Hessian spectrum. We also show that standard weight
initializations yield very flat functions, and that this flatness, together
with overparametrization and the initial weight scale, is responsible for the
strength and type of implicit regularization, consistent with recent work
arXiv:1906.05827. Our implicit regularization results are complementary to
recent work arXiv:1906.07842, done independently, which showed that
initialization scale critically controls implicit regularization via a
kernel-based argument. Our spline-based approach reproduces their key implicit
regularization results but in a far more intuitive and transparent manner.
Going forward, our spline-based approach is likely to extend naturally to the
multivariate and deep settings, and will play a foundational role in efforts to
understand neural networks. Videos of learning dynamics using a spline-based
visualization are available at http://shorturl.at/tFWZ2.
Related papers
- On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - A Dynamics Theory of Implicit Regularization in Deep Low-Rank Matrix
Factorization [21.64166573203593]
Implicit regularization is an important way to interpret neural networks.
Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF)
arXiv Detail & Related papers (2022-12-29T02:11:19Z) - Tractable Dendritic RNNs for Reconstructing Nonlinear Dynamical Systems [7.045072177165241]
We augment a piecewise-linear recurrent neural network (RNN) by a linear spline basis expansion.
We show that this approach retains all the theoretically appealing properties of the simple PLRNN, yet boosts its capacity for approximating arbitrary nonlinear dynamical systems in comparatively low dimensions.
arXiv Detail & Related papers (2022-07-06T09:43:03Z) - Support Vectors and Gradient Dynamics for Implicit Bias in ReLU Networks [45.886537625951256]
We study gradient flow dynamics in the parameter space when training single-neuron ReLU networks.
Specifically, we discover implicit bias in terms of support vectors in ReLU networks, which play a key role in why and how ReLU networks generalize well.
arXiv Detail & Related papers (2022-02-11T08:55:58Z) - On the Explicit Role of Initialization on the Convergence and Implicit
Bias of Overparametrized Linear Networks [1.0323063834827415]
We present a novel analysis of single-hidden-layer linear networks trained under gradient flow.
We show that the squared loss converges exponentially to its optimum.
We derive a novel non-asymptotic upper-bound on the distance between the trained network and the min-norm solution.
arXiv Detail & Related papers (2021-05-13T15:13:51Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z) - Deep Neural Networks with Trainable Activations and Controlled Lipschitz
Constant [26.22495169129119]
We introduce a variational framework to learn the activation functions of deep neural networks.
Our aim is to increase the capacity of the network while controlling an upper-bound of the Lipschitz constant.
We numerically compare our scheme with standard ReLU network and its variations, PReLU and LeakyReLU.
arXiv Detail & Related papers (2020-01-17T12:32:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.