Six Lectures on Linearized Neural Networks
- URL: http://arxiv.org/abs/2308.13431v1
- Date: Fri, 25 Aug 2023 15:23:06 GMT
- Title: Six Lectures on Linearized Neural Networks
- Authors: Theodor Misiakiewicz, Andrea Montanari
- Abstract summary: We first recall the correspondence between neural networks and linear models via the so-called lazy regime.
We then review four models for linearized neural networks: linear regression with concentrated features, kernel ridge regression, random feature model and neural tangent model.
- Score: 15.880450613409302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In these six lectures, we examine what can be learnt about the behavior of
multi-layer neural networks from the analysis of linear models. We first recall
the correspondence between neural networks and linear models via the so-called
lazy regime. We then review four models for linearized neural networks: linear
regression with concentrated features, kernel ridge regression, random feature
model and neural tangent model. Finally, we highlight the limitations of the
linear theory and discuss how other approaches can overcome them.
Related papers
- Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime [52.00917519626559]
This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology.
We also present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK)
This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models.
arXiv Detail & Related papers (2024-05-24T06:30:36Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models [13.283281356356161]
We review the literature on statistical theories of neural networks from three perspectives.
Results on excess risks for neural networks are reviewed.
Papers that attempt to answer how the neural network finds the solution that can generalize well on unseen data'' are reviewed.
arXiv Detail & Related papers (2024-01-14T02:30:19Z) - Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption.
They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware.
A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z) - Quadratic models for understanding catapult dynamics of neural networks [15.381097076708535]
We show that recently proposed Neural Quadratic Models can exhibit the "catapult phase" that arises when training such models with large learning rates.
Our analysis further demonstrates that quadratic models can be an effective tool for analysis of neural networks.
arXiv Detail & Related papers (2022-05-24T05:03:06Z) - Transition to Linearity of Wide Neural Networks is an Emerging Property
of Assembling Weak Models [20.44438519046223]
Wide neural networks with linear output layer have been shown to be near-linear, and to have near-constant neural tangent kernel (NTK)
We show that the linearity of wide neural networks is, in fact, an emerging property of assembling a large number of diverse "weak" sub-models, none of which dominate the assembly.
arXiv Detail & Related papers (2022-03-10T01:27:01Z) - EINNs: Epidemiologically-Informed Neural Networks [75.34199997857341]
We introduce a new class of physics-informed neural networks-EINN-crafted for epidemic forecasting.
We investigate how to leverage both the theoretical flexibility provided by mechanistic models as well as the data-driven expressability afforded by AI models.
arXiv Detail & Related papers (2022-02-21T18:59:03Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Measuring Model Complexity of Neural Networks with Curve Activation
Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function.
We experimentally explore the training process of neural networks and detect overfitting.
We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z) - Learning Parities with Neural Networks [45.6877715768796]
We make a step towards showing leanability of models that are inherently non-linear.
We show that under certain distributions, sparse parities are learnable via gradient decent on depth-two network.
arXiv Detail & Related papers (2020-02-18T06:44:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.