Unified Field Theory for Deep and Recurrent Neural Networks
- URL: http://arxiv.org/abs/2112.05589v1
- Date: Fri, 10 Dec 2021 15:06:11 GMT
- Title: Unified Field Theory for Deep and Recurrent Neural Networks
- Authors: Kai Segadlo, Bastian Epping, Alexander van Meegen, David Dahmen,
Michael Kr\"amer, Moritz Helias
- Abstract summary: We present a unified and systematic derivation of the mean-field theory for both recurrent and deep networks.
We find that convergence towards the mean-field theory is typically slower for recurrent networks than for deep networks.
Our method exposes that Gaussian processes are but the lowest order of a systematic expansion in $1/n$.
- Score: 56.735884560668985
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding capabilities and limitations of different network architectures
is of fundamental importance to machine learning. Bayesian inference on
Gaussian processes has proven to be a viable approach for studying recurrent
and deep networks in the limit of infinite layer width, $n\to\infty$. Here we
present a unified and systematic derivation of the mean-field theory for both
architectures that starts from first principles by employing established
methods from statistical physics of disordered systems. The theory elucidates
that while the mean-field equations are different with regard to their temporal
structure, they yet yield identical Gaussian kernels when readouts are taken at
a single time point or layer, respectively. Bayesian inference applied to
classification then predicts identical performance and capabilities for the two
architectures. Numerically, we find that convergence towards the mean-field
theory is typically slower for recurrent networks than for deep networks and
the convergence speed depends non-trivially on the parameters of the weight
prior as well as the depth or number of time steps, respectively. Our method
exposes that Gaussian processes are but the lowest order of a systematic
expansion in $1/n$. The formalism thus paves the way to investigate the
fundamental differences between recurrent and deep architectures at finite
widths $n$.
Related papers
- Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process.
In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture.
We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Robust Learning of Parsimonious Deep Neural Networks [0.0]
We propose a simultaneous learning and pruning algorithm capable of identifying and eliminating irrelevant structures in a neural network.
We derive a novel hyper-prior distribution over the prior parameters that is crucial for their optimal selection.
We evaluate the proposed algorithm on the MNIST data set and commonly used fully connected and convolutional LeNet architectures.
arXiv Detail & Related papers (2022-05-10T03:38:55Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - The edge of chaos: quantum field theory and deep neural networks [0.0]
We explicitly construct the quantum field theory corresponding to a general class of deep neural networks.
We compute the loop corrections to the correlation function in a perturbative expansion in the ratio of depth $T$ to width $N$.
Our analysis provides a first-principles approach to the rapidly emerging NN-QFT correspondence, and opens several interesting avenues to the study of criticality in deep neural networks.
arXiv Detail & Related papers (2021-09-27T18:00:00Z) - Reframing Neural Networks: Deep Structure in Overcomplete
Representations [41.84502123663809]
We introduce deep frame approximation, a unifying framework for representation learning with structured overcomplete frames.
We quantify structural differences with the deep frame potential, a data-independent measure of coherence linked to representation uniqueness and stability.
This connection to the established theory of overcomplete representations suggests promising new directions for principled deep network architecture design.
arXiv Detail & Related papers (2021-03-10T01:15:14Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.