Related papers: Wide Neural Networks with Bottlenecks are Deep Gaussian Processes

Wide Neural Networks with Bottlenecks are Deep Gaussian Processes

URL: http://arxiv.org/abs/2001.00921v3
Date: Mon, 6 Jul 2020 16:17:13 GMT
Title: Wide Neural Networks with Bottlenecks are Deep Gaussian Processes
Authors: Devanshu Agrawal, Theodore Papamarkou, Jacob Hinkle
Abstract summary: We consider the wide limit of BNNs where some hidden layers, called "bottlenecks", are held at finite width. Although intuitive, the subtlety of the proof is in showing that the wide limit of a composition of networks is in fact the composition of the limiting GPs. We also analyze theoretically a single-bottleneck NNGP, finding that the bottleneck induces dependence between the outputs of a multi-output network that persists through extreme post-bottleneck depths.
Score: 2.6641834518599308
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There has recently been much work on the "wide limit" of neural networks, where Bayesian neural networks (BNNs) are shown to converge to a Gaussian process (GP) as all hidden layers are sent to infinite width. However, these results do not apply to architectures that require one or more of the hidden layers to remain narrow. In this paper, we consider the wide limit of BNNs where some hidden layers, called "bottlenecks", are held at finite width. The result is a composition of GPs that we term a "bottleneck neural network Gaussian process" (bottleneck NNGP). Although intuitive, the subtlety of the proof is in showing that the wide limit of a composition of networks is in fact the composition of the limiting GPs. We also analyze theoretically a single-bottleneck NNGP, finding that the bottleneck induces dependence between the outputs of a multi-output network that persists through extreme post-bottleneck depths, and prevents the kernel of the network from losing discriminative power at extreme post-bottleneck depths.

Related papers

Robust Learning in Bayesian Parallel Branching Graph Neural Networks: The Narrow Width Limit [4.373803477995854]
We investigate the narrow width limit of the Bayesian Parallel Branching Graph Neural Network (BPB-GNN) We show that when the width of a BPB-GNN is significantly smaller compared to the number of training examples, each branch exhibits more robust learning. Our results characterize a newly defined narrow-width regime for parallel branching networks in general.
arXiv Detail & Related papers (2024-07-26T15:14:22Z)
Graph Neural Networks Do Not Always Oversmooth [46.57665708260211]
We study oversmoothing in graph convolutional networks (GCNs) by using their Gaussian process (GP) equivalence in the limit of infinitely many hidden features. We identify a new, non-oversmoothing phase: if the initial weights of the network have sufficiently large variance, GCNs do not oversmooth, and node features remain informative even at large depth.
arXiv Detail & Related papers (2024-06-04T12:47:13Z)
Sparsity-depth Tradeoff in Infinitely Wide Deep Neural Networks [22.083873334272027]
We observe that sparser networks outperform the non-sparse networks at shallow depths on a variety of datasets. We extend the existing theory on the generalization error of kernel-ridge regression.
arXiv Detail & Related papers (2023-05-17T20:09:35Z)
Deep Maxout Network Gaussian Process [1.9292807030801753]
We derive the equivalence of the deep, infinite-width maxout network and the Gaussian process (GP) We build up the connection between our deep maxout network kernel and deep neural network kernels.
arXiv Detail & Related papers (2022-08-08T23:52:26Z)
Deep Architecture Connectivity Matters for Its Convergence: A Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training. We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z)
The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective [34.67386186205545]
This paper decouples capacity and width via the generalization of neural networks to Deep Gaussian Processes (Deep GP) Surprisingly, we prove that even nonparametric Deep GP converges to Gaussian processes, effectively becoming shallower without any increase in representational power. We find there is a "sweet spot" that maximizes test set performance before the limiting GP behavior prevents adaptability, occurring at width = 1 or width = 2 for nonparametric Deep GP.
arXiv Detail & Related papers (2021-06-11T17:58:58Z)
Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks [21.13299067136635]
We provide tight bounds on the smallest eigenvalue of NTK matrices for deep ReLU networks. In the finite-width setting, the network architectures we consider are quite general.
arXiv Detail & Related papers (2020-12-21T19:32:17Z)
Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks. Centered and ensembled finite networks have reduced posterior variance. Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z)
Bayesian Deep Ensembles via the Neural Tangent Kernel [49.569912265882124]
We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK) We introduce a simple modification to standard deep ensembles training, through addition of a computationally-tractable, randomised and untrainable function to each ensemble member. We prove that our Bayesian deep ensembles make more conservative predictions than standard deep ensembles in the infinite width limit.
arXiv Detail & Related papers (2020-07-11T22:10:52Z)
Exact posterior distributions of wide Bayesian neural networks [51.20413322972014]
We show that the exact BNN posterior converges (weakly) to the one induced by the GP limit of the prior. For empirical validation, we show how to generate exact samples from a finite BNN on a small dataset via rejection sampling.
arXiv Detail & Related papers (2020-06-18T13:57:04Z)
On Random Kernels of Residual Architectures [93.94469470368988]
We derive finite width and depth corrections for the Neural Tangent Kernel (NTK) of ResNets and DenseNets. Our findings show that in ResNets, convergence to the NTK may occur when depth and width simultaneously tend to infinity. In DenseNets, however, convergence of the NTK to its limit as the width tends to infinity is guaranteed.
arXiv Detail & Related papers (2020-01-28T16:47:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.