Deep neural networks with dependent weights: Gaussian Process mixture
limit, heavy tails, sparsity and compressibility
- URL: http://arxiv.org/abs/2205.08187v2
- Date: Mon, 11 Sep 2023 05:07:11 GMT
- Title: Deep neural networks with dependent weights: Gaussian Process mixture
limit, heavy tails, sparsity and compressibility
- Authors: Hoil Lee, Fadhel Ayed, Paul Jung, Juho Lee, Hongseok Yang and
Fran\c{c}ois Caron
- Abstract summary: This article studies the infinite-width limit of deep feedforward neural networks whose weights are dependent.
Each hidden node of the network is assigned a nonnegative random variable that controls the variance of the outgoing weights of that node.
- Score: 18.531464406721412
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This article studies the infinite-width limit of deep feedforward neural
networks whose weights are dependent, and modelled via a mixture of Gaussian
distributions. Each hidden node of the network is assigned a nonnegative random
variable that controls the variance of the outgoing weights of that node. We
make minimal assumptions on these per-node random variables: they are iid and
their sum, in each layer, converges to some finite random variable in the
infinite-width limit. Under this model, we show that each layer of the
infinite-width neural network can be characterised by two simple quantities: a
non-negative scalar parameter and a L\'evy measure on the positive reals. If
the scalar parameters are strictly positive and the L\'evy measures are trivial
at all hidden layers, then one recovers the classical Gaussian process (GP)
limit, obtained with iid Gaussian weights. More interestingly, if the L\'evy
measure of at least one layer is non-trivial, we obtain a mixture of Gaussian
processes (MoGP) in the large-width limit. The behaviour of the neural network
in this regime is very different from the GP regime. One obtains correlated
outputs, with non-Gaussian distributions, possibly with heavy tails.
Additionally, we show that, in this regime, the weights are compressible, and
some nodes have asymptotically non-negligible contributions, therefore
representing important hidden features. Many sparsity-promoting neural network
models can be recast as special cases of our approach, and we discuss their
infinite-width limits; we also present an asymptotic analysis of the pruning
error. We illustrate some of the benefits of the MoGP regime over the GP regime
in terms of representation learning and compressibility on simulated, MNIST and
Fashion MNIST datasets.
Related papers
- Random ReLU Neural Networks as Non-Gaussian Processes [20.607307985674428]
We show that random neural networks with rectified linear unit activation functions are well-defined non-Gaussian processes.
As a by-product, we demonstrate that these networks are solutions to differential equations driven by impulsive white noise.
arXiv Detail & Related papers (2024-05-16T16:28:11Z) - Quantitative CLTs in Deep Neural Networks [12.845031126178593]
We study the distribution of a fully connected neural network with random Gaussian weights and biases.
We obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth.
Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature.
arXiv Detail & Related papers (2023-07-12T11:35:37Z) - Posterior Inference on Shallow Infinitely Wide Bayesian Neural Networks under Weights with Unbounded Variance [1.5960546024967326]
It is known that the infinite width scaling limit of a Bayesian neural network with one hidden layer is a Gaussian process, when the network weights have bounded prior variance.
Neal's result has been extended to networks with multiple hidden layers and to convolutional neural networks, also with Gaussian process scaling limits.
Our contribution is an interpretable and computationally efficient procedure for posterior inference, using a conditionally Gaussian representation, that then allows full use of the Gaussian process machinery for tractable posterior inference and uncertainty quantification in the non-Gaussian regime.
arXiv Detail & Related papers (2023-05-18T02:55:00Z) - A Unified Algebraic Perspective on Lipschitz Neural Networks [88.14073994459586]
This paper introduces a novel perspective unifying various types of 1-Lipschitz neural networks.
We show that many existing techniques can be derived and generalized via finding analytical solutions of a common semidefinite programming (SDP) condition.
Our approach, called SDP-based Lipschitz Layers (SLL), allows us to design non-trivial yet efficient generalization of convex potential layers.
arXiv Detail & Related papers (2023-03-06T14:31:09Z) - The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich
Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$.
We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Infinitely Wide Tensor Networks as Gaussian Process [1.7894377200944511]
In this paper, we show the equivalence of the infinitely wide Networks and the Gaussian Process.
We implement the Gaussian Process corresponding to the infinite limit tensor networks and plot the sample paths of these models.
arXiv Detail & Related papers (2021-01-07T02:29:15Z) - Characteristics of Monte Carlo Dropout in Wide Neural Networks [16.639005039546745]
Monte Carlo (MC) dropout is one of the state-of-the-art approaches for uncertainty estimation in neural networks (NNs)
We study the limiting distribution of wide untrained NNs under dropout more rigorously and prove that they as well converge to Gaussian processes for fixed sets of weights and biases.
We investigate how (strongly) correlated pre-activations can induce non-Gaussian behavior in NNs with strongly correlated weights.
arXiv Detail & Related papers (2020-07-10T15:14:43Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.