Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide
Neural Networks
- URL: http://arxiv.org/abs/2205.09653v1
- Date: Thu, 19 May 2022 16:10:10 GMT
- Title: Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide
Neural Networks
- Authors: Blake Bordelon, Cengiz Pehlevan
- Abstract summary: We analyze feature learning in infinite width neural networks trained with gradient flow through a self-consistent dynamical field theory.
We construct a collection of deterministic dynamical order parameters which are inner-product kernels for hidden unit activations and gradients in each layer at pairs of time points.
- Score: 18.27510863075184
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We analyze feature learning in infinite width neural networks trained with
gradient flow through a self-consistent dynamical field theory. We construct a
collection of deterministic dynamical order parameters which are inner-product
kernels for hidden unit activations and gradients in each layer at pairs of
time points, providing a reduced description of network activity through
training. These kernel order parameters collectively define the hidden layer
activation distribution, the evolution of the neural tangent kernel, and
consequently output predictions. For deep linear networks, these kernels
satisfy a set of algebraic matrix equations. For nonlinear networks, we provide
an alternating sampling procedure to self-consistently solve for the kernel
order parameters. We provide comparisons of the self-consistent solution to
various approximation schemes including the static NTK approximation, gradient
independence assumption, and leading order perturbation theory, showing that
each of these approximations can break down in regimes where general
self-consistent solutions still provide an accurate description. Lastly, we
provide experiments in more realistic settings which demonstrate that the loss
and kernel dynamics of CNNs at fixed feature learning strength is preserved
across different widths on a CIFAR classification task.
Related papers
- Emergence of Globally Attracting Fixed Points in Deep Neural Networks With Nonlinear Activations [24.052411316664017]
We introduce a theoretical framework for the evolution of the kernel sequence, which measures the similarity between the hidden representation for two different inputs.
For nonlinear activations, the kernel sequence converges globally to a unique fixed point, which can correspond to similar representations depending on the activation and network architecture.
This work provides new insights into the implicit biases of deep neural networks and how architectural choices influence the evolution of representations across layers.
arXiv Detail & Related papers (2024-10-26T07:10:47Z) - A theory of data variability in Neural Network Bayesian inference [0.70224924046445]
We provide a field-theoretic formalism which covers the generalization properties of infinitely wide networks.
We derive the generalization properties from the statistical properties of the input.
We show that data variability leads to a non-Gaussian action reminiscent of a ($varphi3+varphi4$)-theory.
arXiv Detail & Related papers (2023-07-31T14:11:32Z) - Controlling the Inductive Bias of Wide Neural Networks by Modifying the Kernel's Spectrum [18.10812063219831]
We introduce Modified Spectrum Kernels (MSKs) to approximate kernels with desired eigenvalues.
We propose a preconditioned gradient descent method, which alters the trajectory of gradient descent.
Our method is both computationally efficient and simple to implement.
arXiv Detail & Related papers (2023-07-26T22:39:47Z) - Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean
Field Neural Networks [47.73646927060476]
We analyze the dynamics of finite width effects in wide but finite feature learning neural networks.
Our results are non-perturbative in the strength of feature learning.
arXiv Detail & Related papers (2023-04-06T23:11:49Z) - Simple initialization and parametrization of sinusoidal networks via
their kernel bandwidth [92.25666446274188]
sinusoidal neural networks with activations have been proposed as an alternative to networks with traditional activation functions.
We first propose a simplified version of such sinusoidal neural networks, which allows both for easier practical implementation and simpler theoretical analysis.
We then analyze the behavior of these networks from the neural tangent kernel perspective and demonstrate that their kernel approximates a low-pass filter with an adjustable bandwidth.
arXiv Detail & Related papers (2022-11-26T07:41:48Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - Training End-to-End Analog Neural Networks with Equilibrium Propagation [64.0476282000118]
We introduce a principled method to train end-to-end analog neural networks by gradient descent.
We show mathematically that a class of analog neural networks (called nonlinear resistive networks) are energy-based models.
Our work can guide the development of a new generation of ultra-fast, compact and low-power neural networks supporting on-chip learning.
arXiv Detail & Related papers (2020-06-02T23:38:35Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z) - Avoiding Kernel Fixed Points: Computing with ELU and GELU Infinite
Networks [12.692279981822011]
We derive the covariance functions of multi-layer perceptrons with exponential linear units (ELU) and Gaussian error linear units (GELU)
We analyse the fixed-point dynamics of iterated kernels corresponding to a broad range of activation functions.
We find that unlike some previously studied neural network kernels, these new kernels exhibit non-trivial fixed-point dynamics.
arXiv Detail & Related papers (2020-02-20T01:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.