Neural Network Gaussian Processes by Increasing Depth
- URL: http://arxiv.org/abs/2108.12862v1
- Date: Sun, 29 Aug 2021 15:37:26 GMT
- Title: Neural Network Gaussian Processes by Increasing Depth
- Authors: Shao-Qun Zhang and Feng-Lei Fan
- Abstract summary: We show that increasing the depth of a neural network can give rise to a Gaussian process.
We also theoretically characterize its uniform tightness property and the smallest eigenvalue of its associated kernel.
These characterizations can not only enhance our understanding of the proposed depth-induced Gaussian processes, but also pave the way for future applications.
- Score: 0.6091702876917281
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent years have witnessed an increasing interest in the correspondence
between infinitely wide networks and Gaussian processes. Despite the
effectiveness and elegance of the current neural network Gaussian process
theory, to the best of our knowledge, all the neural network Gaussian processes
are essentially induced by increasing width. However, in the era of deep
learning, what concerns us more regarding a neural network is its depth as well
as how depth impacts the behaviors of a network. Inspired by a width-depth
symmetry consideration, we use a shortcut network to show that increasing the
depth of a neural network can also give rise to a Gaussian process, which is a
valuable addition to the existing theory and contributes to revealing the true
picture of deep learning. Beyond the proposed Gaussian process by depth, we
theoretically characterize its uniform tightness property and the smallest
eigenvalue of its associated kernel. These characterizations can not only
enhance our understanding of the proposed depth-induced Gaussian processes, but
also pave the way for future applications. Lastly, we examine the performance
of the proposed Gaussian process by regression experiments on two real-world
data sets.
Related papers
- Beyond IID weights: sparse and low-rank deep Neural Networks are also Gaussian Processes [3.686808512438363]
We extend the proof of Matthews et al. to a larger class of initial weight distributions.
We show that fully-connected and convolutional networks with PSEUDO-IID distributions are all effectively equivalent up to their variance.
Using our results, one can identify the Edge-of-Chaos for a broader class of neural networks and tune them at criticality in order to enhance their training.
arXiv Detail & Related papers (2023-10-25T12:38:36Z) - Wide Neural Networks as Gaussian Processes: Lessons from Deep
Equilibrium Models [16.07760622196666]
We study the deep equilibrium model (DEQ), an infinite-depth neural network with shared weight matrices across layers.
Our analysis reveals that as the width of DEQ layers approaches infinity, it converges to a Gaussian process.
Remarkably, this convergence holds even when the limits of depth and width are interchanged.
arXiv Detail & Related papers (2023-10-16T19:00:43Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Sparsity-depth Tradeoff in Infinitely Wide Deep Neural Networks [22.083873334272027]
We observe that sparser networks outperform the non-sparse networks at shallow depths on a variety of datasets.
We extend the existing theory on the generalization error of kernel-ridge regression.
arXiv Detail & Related papers (2023-05-17T20:09:35Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Deep Maxout Network Gaussian Process [1.9292807030801753]
We derive the equivalence of the deep, infinite-width maxout network and the Gaussian process (GP)
We build up the connection between our deep maxout network kernel and deep neural network kernels.
arXiv Detail & Related papers (2022-08-08T23:52:26Z) - The Principles of Deep Learning Theory [19.33681537640272]
This book develops an effective theory approach to understanding deep neural networks of practical relevance.
We explain how these effectively-deep networks learn nontrivial representations from training.
We show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks.
arXiv Detail & Related papers (2021-06-18T15:00:00Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience.
We show that sparse coding can effectively maximize the entropy of the output signals.
Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.