Scale Mixtures of Neural Network Gaussian Processes
- URL: http://arxiv.org/abs/2107.01408v1
- Date: Sat, 3 Jul 2021 11:02:18 GMT
- Title: Scale Mixtures of Neural Network Gaussian Processes
- Authors: Hyungi Lee, Eunggu Yun, Hongseok Yang, Juho Lee
- Abstract summary: We introduce a scale mixture of $mathrmNNGP$ for which we introduce a prior on the scale of the last-layer parameters.
We show that with certain scale priors, we obtain heavytailed processes, and we recover Student's $t$ processes in the case of inverse gamma distributions.
We further analyze the neural networks with our prior setting and trained with gradient descents and obtain similar results as for $mathrmNNGP$.
- Score: 22.07524388784668
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent works have revealed that infinitely-wide feed-forward or recurrent
neural networks of any architecture correspond to Gaussian processes referred
to as $\mathrm{NNGP}$. While these works have extended the class of neural
networks converging to Gaussian processes significantly, however, there has
been little focus on broadening the class of stochastic processes that such
neural networks converge to. In this work, inspired by the scale mixture of
Gaussian random variables, we propose the scale mixture of $\mathrm{NNGP}$ for
which we introduce a prior distribution on the scale of the last-layer
parameters. We show that simply introducing a scale prior on the last-layer
parameters can turn infinitely-wide neural networks of any architecture into a
richer class of stochastic processes. Especially, with certain scale priors, we
obtain heavy-tailed stochastic processes, and we recover Student's $t$
processes in the case of inverse gamma priors. We further analyze the
distributions of the neural networks initialized with our prior setting and
trained with gradient descents and obtain similar results as for
$\mathrm{NNGP}$. We present a practical posterior-inference algorithm for the
scale mixture of $\mathrm{NNGP}$ and empirically demonstrate its usefulness on
regression and classification tasks.
Related papers
- Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection [11.729744197698718]
We present an algorithmic framework to approximate a neural network of finite width and depth.
We iteratively approximate the output distribution of each layer of the neural network as a mixture of Gaussian processes.
Our results can represent an important step towards understanding neural network predictions.
arXiv Detail & Related papers (2024-07-26T12:45:53Z) - Random ReLU Neural Networks as Non-Gaussian Processes [20.607307985674428]
We show that random neural networks with rectified linear unit activation functions are well-defined non-Gaussian processes.
As a by-product, we demonstrate that these networks are solutions to differential equations driven by impulsive white noise.
arXiv Detail & Related papers (2024-05-16T16:28:11Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks.
We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order.
In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Classifying high-dimensional Gaussian mixtures: Where kernel methods
fail and neural networks succeed [27.38015169185521]
We show theoretically that two-layer neural networks (2LNN) with only a few hidden neurons can beat the performance of kernel learning.
We show how over-parametrising the neural network leads to faster convergence, but does not improve its final performance.
arXiv Detail & Related papers (2021-02-23T15:10:15Z) - Large-width functional asymptotics for deep Gaussian neural networks [2.7561479348365734]
We consider fully connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions.
Our results contribute to recent theoretical studies on the interplay between infinitely wide deep neural networks and processes.
arXiv Detail & Related papers (2021-02-20T10:14:37Z) - Attentive Gaussian processes for probabilistic time-series generation [4.94950858749529]
We propose a computationally efficient attention-based network combined with the Gaussian process regression to generate real-valued sequence.
We develop a block-wise training algorithm to allow mini-batch training of the network while the GP is trained using full-batch.
The algorithm has been proved to converge and shows comparable, if not better, quality of the found solution.
arXiv Detail & Related papers (2021-02-10T01:19:15Z) - Generalized Leverage Score Sampling for Neural Networks [82.95180314408205]
Leverage score sampling is a powerful technique that originates from theoretical computer science.
In this work, we generalize the results in [Avron, Kapralov, Musco, Musco, Velingker and Zandieh 17] to a broader class of kernels.
arXiv Detail & Related papers (2020-09-21T14:46:01Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.