Infinitely Wide Tensor Networks as Gaussian Process
- URL: http://arxiv.org/abs/2101.02333v1
- Date: Thu, 7 Jan 2021 02:29:15 GMT
- Title: Infinitely Wide Tensor Networks as Gaussian Process
- Authors: Erdong Guo and David Draper
- Abstract summary: In this paper, we show the equivalence of the infinitely wide Networks and the Gaussian Process.
We implement the Gaussian Process corresponding to the infinite limit tensor networks and plot the sample paths of these models.
- Score: 1.7894377200944511
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gaussian Process is a non-parametric prior which can be understood as a
distribution on the function space intuitively. It is known that by introducing
appropriate prior to the weights of the neural networks, Gaussian Process can
be obtained by taking the infinite-width limit of the Bayesian neural networks
from a Bayesian perspective. In this paper, we explore the infinitely wide
Tensor Networks and show the equivalence of the infinitely wide Tensor Networks
and the Gaussian Process. We study the pure Tensor Network and another two
extended Tensor Network structures: Neural Kernel Tensor Network and Tensor
Network hidden layer Neural Network and prove that each one will converge to
the Gaussian Process as the width of each model goes to infinity. (We note here
that Gaussian Process can also be obtained by taking the infinite limit of at
least one of the bond dimensions $\alpha_{i}$ in the product of tensor nodes,
and the proofs can be done with the same ideas in the proofs of the
infinite-width cases.) We calculate the mean function (mean vector) and the
covariance function (covariance matrix) of the finite dimensional distribution
of the induced Gaussian Process by the infinite-width tensor network with a
general set-up. We study the properties of the covariance function and derive
the approximation of the covariance function when the integral in the
expectation operator is intractable. In the numerical experiments, we implement
the Gaussian Process corresponding to the infinite limit tensor networks and
plot the sample paths of these models. We study the hyperparameters and plot
the sample path families in the induced Gaussian Process by varying the
standard deviations of the prior distributions. As expected, the parameters in
the prior distribution namely the hyper-parameters in the induced Gaussian
Process controls the characteristic lengthscales of the Gaussian Process.
Related papers
- von Mises Quasi-Processes for Bayesian Circular Regression [57.88921637944379]
We explore a family of expressive and interpretable distributions over circle-valued random functions.
The resulting probability model has connections with continuous spin models in statistical physics.
For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Markov Chain Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-19T01:57:21Z) - Random ReLU Neural Networks as Non-Gaussian Processes [20.607307985674428]
We show that random neural networks with rectified linear unit activation functions are well-defined non-Gaussian processes.
As a by-product, we demonstrate that these networks are solutions to differential equations driven by impulsive white noise.
arXiv Detail & Related papers (2024-05-16T16:28:11Z) - Wide Deep Neural Networks with Gaussian Weights are Very Close to
Gaussian Processes [1.0878040851638]
We show that the distance between the network output and the corresponding Gaussian approximation scales inversely with the width of the network, exhibiting faster convergence than the naive suggested by the central limit theorem.
We also apply our bounds to obtain theoretical approximations for the exact posterior distribution of the network, when the likelihood is a bounded Lipschitz function of the network output evaluated on a (finite) training set.
arXiv Detail & Related papers (2023-12-18T22:29:40Z) - Posterior Contraction Rates for Mat\'ern Gaussian Processes on
Riemannian Manifolds [51.68005047958965]
We show that intrinsic Gaussian processes can achieve better performance in practice.
Our work shows that finer-grained analyses are needed to distinguish between different levels of data-efficiency.
arXiv Detail & Related papers (2023-09-19T20:30:58Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - On Connecting Deep Trigonometric Networks with Deep Gaussian Processes:
Covariance, Expressivity, and Neural Tangent Kernel [6.599344783327053]
We show that the weight space view yields the same effective covariance functions which were obtained previously in function space.
The trig networks are flexible and expressive as one can freely adopt different prior distributions over the parameters in weight and feature layers.
arXiv Detail & Related papers (2022-03-14T18:14:59Z) - Large-width functional asymptotics for deep Gaussian neural networks [2.7561479348365734]
We consider fully connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions.
Our results contribute to recent theoretical studies on the interplay between infinitely wide deep neural networks and processes.
arXiv Detail & Related papers (2021-02-20T10:14:37Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - The Ridgelet Prior: A Covariance Function Approach to Prior
Specification for Bayesian Neural Networks [4.307812758854161]
We construct a prior distribution for the parameters of a network that approximates the posited Gaussian process in the output space of the network.
This establishes the property that a Bayesian neural network can approximate any Gaussian process whose covariance function is sufficiently regular.
arXiv Detail & Related papers (2020-10-16T16:39:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.