The edge of chaos: quantum field theory and deep neural networks
- URL: http://arxiv.org/abs/2109.13247v1
- Date: Mon, 27 Sep 2021 18:00:00 GMT
- Title: The edge of chaos: quantum field theory and deep neural networks
- Authors: Kevin T. Grosvenor and Ro Jefferson
- Abstract summary: We explicitly construct the quantum field theory corresponding to a general class of deep neural networks.
We compute the loop corrections to the correlation function in a perturbative expansion in the ratio of depth $T$ to width $N$.
Our analysis provides a first-principles approach to the rapidly emerging NN-QFT correspondence, and opens several interesting avenues to the study of criticality in deep neural networks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We explicitly construct the quantum field theory corresponding to a general
class of deep neural networks encompassing both recurrent and feedforward
architectures. We first consider the mean-field theory (MFT) obtained as the
leading saddlepoint in the action, and derive the condition for criticality via
the largest Lyapunov exponent. We then compute the loop corrections to the
correlation function in a perturbative expansion in the ratio of depth $T$ to
width $N$, and find a precise analogy with the well-studied $O(N)$ vector
model, in which the variance of the weight initializations plays the role of
the 't Hooft coupling. In particular, we compute both the $\mathcal{O}(1)$
corrections quantifying fluctuations from typicality in the ensemble of
networks, and the subleading $\mathcal{O}(T/N)$ corrections due to finite-width
effects. These provide corrections to the correlation length that controls the
depth to which information can propagate through the network, and thereby sets
the scale at which such networks are trainable by gradient descent. Our
analysis provides a first-principles approach to the rapidly emerging NN-QFT
correspondence, and opens several interesting avenues to the study of
criticality in deep neural networks.
Related papers
- Wide Neural Networks as Gaussian Processes: Lessons from Deep
Equilibrium Models [16.07760622196666]
We study the deep equilibrium model (DEQ), an infinite-depth neural network with shared weight matrices across layers.
Our analysis reveals that as the width of DEQ layers approaches infinity, it converges to a Gaussian process.
Remarkably, this convergence holds even when the limits of depth and width are interchanged.
arXiv Detail & Related papers (2023-10-16T19:00:43Z) - Feature Learning and Generalization in Deep Networks with Orthogonal Weights [1.7956122940209063]
Deep neural networks with numerically weights from independent Gaussian distributions can be tuned to criticality.
These networks still exhibit fluctuations that grow linearly with the depth of the network.
We show analytically that rectangular networks with tanh activations and weights from the ensemble of matrices have corresponding preactivation fluctuations.
arXiv Detail & Related papers (2023-10-11T18:00:02Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Deep Networks and the Multiple Manifold Problem [15.144495799445824]
We study the multiple manifold problem, a binary classification task modeled on applications in machine vision, in which a deep fully-connected neural network is trained to separate two low-dimensional submanifolds of the unit sphere.
We prove for a simple manifold configuration that when the network depth $L$ is large relative to certain geometric and statistical properties of the data, the network width grows as a sufficiently large in $L$.
Our analysis demonstrates concrete benefits of depth and width in the context of a practically-motivated model problem.
arXiv Detail & Related papers (2020-08-25T19:20:00Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - A Revision of Neural Tangent Kernel-based Approaches for Neural Networks [34.75076385561115]
We use the neural tangent kernel to show that networks can fit any finite training sample perfectly.
A simple and analytic kernel function was derived as indeed equivalent to a fully-trained network.
Our tighter analysis resolves the scaling problem and enables the validation of the original NTK-based results.
arXiv Detail & Related papers (2020-07-02T05:07:55Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.