Transition to Linearity of Wide Neural Networks is an Emerging Property
of Assembling Weak Models
- URL: http://arxiv.org/abs/2203.05104v1
- Date: Thu, 10 Mar 2022 01:27:01 GMT
- Title: Transition to Linearity of Wide Neural Networks is an Emerging Property
of Assembling Weak Models
- Authors: Chaoyue Liu, Libin Zhu, Mikhail Belkin
- Abstract summary: Wide neural networks with linear output layer have been shown to be near-linear, and to have near-constant neural tangent kernel (NTK)
We show that the linearity of wide neural networks is, in fact, an emerging property of assembling a large number of diverse "weak" sub-models, none of which dominate the assembly.
- Score: 20.44438519046223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Wide neural networks with linear output layer have been shown to be
near-linear, and to have near-constant neural tangent kernel (NTK), in a region
containing the optimization path of gradient descent. These findings seem
counter-intuitive since in general neural networks are highly complex models.
Why does a linear structure emerge when the networks become wide? In this work,
we provide a new perspective on this "transition to linearity" by considering a
neural network as an assembly model recursively built from a set of sub-models
corresponding to individual neurons. In this view, we show that the linearity
of wide neural networks is, in fact, an emerging property of assembling a large
number of diverse "weak" sub-models, none of which dominate the assembly.
Related papers
- Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime [52.00917519626559]
This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology.
We also present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK)
This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models.
arXiv Detail & Related papers (2024-05-24T06:30:36Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - When Deep Learning Meets Polyhedral Theory: A Survey [6.899761345257773]
In the past decade, deep became the prevalent methodology for predictive modeling thanks to the remarkable accuracy of deep neural learning.
Meanwhile, the structure of neural networks converged back to simplerwise and linear functions.
arXiv Detail & Related papers (2023-04-29T11:46:53Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Quadratic models for understanding catapult dynamics of neural networks [15.381097076708535]
We show that recently proposed Neural Quadratic Models can exhibit the "catapult phase" that arises when training such models with large learning rates.
Our analysis further demonstrates that quadratic models can be an effective tool for analysis of neural networks.
arXiv Detail & Related papers (2022-05-24T05:03:06Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - On the linearity of large non-linear models: when and why the tangent
kernel is constant [20.44438519046223]
We shed light on the remarkable phenomenon of transition to linearity of certain neural networks as their width approaches infinity.
We show that the transition to linearity of the model and, equivalently, constancy of the (neural) tangent kernel (NTK) result from the scaling properties of the norm of the Hessian matrix of the network.
arXiv Detail & Related papers (2020-10-02T16:44:45Z) - Measuring Model Complexity of Neural Networks with Curve Activation
Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function.
We experimentally explore the training process of neural networks and detect overfitting.
We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z) - Learning Parities with Neural Networks [45.6877715768796]
We make a step towards showing leanability of models that are inherently non-linear.
We show that under certain distributions, sparse parities are learnable via gradient decent on depth-two network.
arXiv Detail & Related papers (2020-02-18T06:44:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.