On Connecting Deep Trigonometric Networks with Deep Gaussian Processes:
Covariance, Expressivity, and Neural Tangent Kernel
- URL: http://arxiv.org/abs/2203.07411v1
- Date: Mon, 14 Mar 2022 18:14:59 GMT
- Title: On Connecting Deep Trigonometric Networks with Deep Gaussian Processes:
Covariance, Expressivity, and Neural Tangent Kernel
- Authors: Chi-Ken Lu and Patrick Shafto
- Abstract summary: We show that the weight space view yields the same effective covariance functions which were obtained previously in function space.
The trig networks are flexible and expressive as one can freely adopt different prior distributions over the parameters in weight and feature layers.
- Score: 6.599344783327053
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Gaussian Process as a Bayesian learning model is promising because it is
expressive and capable of uncertainty estimation. With Bochner's theorem, we
can view the deep Gaussian process with squared exponential kernels as a deep
trigonometric network consisting of the random feature layers, sine and cosine
activation units, and random weight layers. Focusing on this particular class
of models allows us to obtain analytical results. We shall show that the weight
space view yields the same effective covariance functions which were obtained
previously in function space. The heavy statistical tails can be studied with
multivariate characteristic function. In addition, the trig networks are
flexible and expressive as one can freely adopt different prior distributions
over the parameters in weight and feature layers. Lastly, the deep
trigonometric network representation of deep Gaussian process allows the
derivation of its neural tangent kernel, which can reveal the mean of
predictive distribution from the intractable inference.
Related papers
- von Mises Quasi-Processes for Bayesian Circular Regression [57.88921637944379]
We explore a family of expressive and interpretable distributions over circle-valued random functions.
The resulting probability model has connections with continuous spin models in statistical physics.
For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Markov Chain Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-19T01:57:21Z) - Feature learning in finite-width Bayesian deep linear networks with multiple outputs and convolutional layers [39.71511919246829]
Deep linear networks have been extensively studied, but little is known in the case of finite-width architectures with multiple outputs and convolutional layers.
Our work provides a dictionary that translates this physics intuition and terminology into rigorous Bayesian statistics.
arXiv Detail & Related papers (2024-06-05T13:37:42Z) - Asymptotics of Learning with Deep Structured (Random) Features [9.366617422860543]
For a large class of feature maps we provide a tight characterisation of the test error associated with learning the readout layer.
In some cases our results can capture feature maps learned by deep, finite-width neural networks trained under gradient descent.
arXiv Detail & Related papers (2024-02-21T18:35:27Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Bayesian inference with finitely wide neural networks [0.4568777157687961]
We propose a non-Gaussian distribution in differential form to model a finite set of outputs from a random neural network.
We are able to derive the non-Gaussian posterior distribution in Bayesian regression task.
arXiv Detail & Related papers (2023-03-06T03:25:30Z) - NeuralEF: Deconstructing Kernels by Deep Neural Networks [47.54733625351363]
Traditional nonparametric solutions based on the Nystr"om formula suffer from scalability issues.
Recent work has resorted to a parametric approach, i.e., training neural networks to approximate the eigenfunctions.
We show that these problems can be fixed by using a new series of objective functions that generalizes to space of supervised and unsupervised learning problems.
arXiv Detail & Related papers (2022-04-30T05:31:07Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Universality and Optimality of Structured Deep Kernel Networks [0.0]
Kernel based methods yield approximation models that are flexible, efficient and powerful.
Recent success of machine learning methods has been driven by deep neural networks (NNs)
In this paper, we show that the use of special types of kernels yield models reminiscent of neural networks.
arXiv Detail & Related papers (2021-05-15T14:10:35Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Infinitely Wide Tensor Networks as Gaussian Process [1.7894377200944511]
In this paper, we show the equivalence of the infinitely wide Networks and the Gaussian Process.
We implement the Gaussian Process corresponding to the infinite limit tensor networks and plot the sample paths of these models.
arXiv Detail & Related papers (2021-01-07T02:29:15Z) - Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization.
We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.