Related papers: Random ReLU Neural Networks as Non-Gaussian Processes

Random ReLU Neural Networks as Non-Gaussian Processes

URL: http://arxiv.org/abs/2405.10229v1
Date: Thu, 16 May 2024 16:28:11 GMT
Title: Random ReLU Neural Networks as Non-Gaussian Processes
Authors: Rahul Parhi, Pakshal Bohra, Ayoub El Biari, Mehrsa Pourya, Michael Unser,
Abstract summary: We show that random neural networks with rectified linear unit activation functions are well-defined non-Gaussian processes. As a by-product, we demonstrate that these networks are solutions to differential equations driven by impulsive white noise.
Score: 20.607307985674428
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a large class of shallow neural networks with randomly initialized parameters and rectified linear unit activation functions. We prove that these random neural networks are well-defined non-Gaussian processes. As a by-product, we demonstrate that these networks are solutions to stochastic differential equations driven by impulsive white noise (combinations of random Dirac measures). These processes are parameterized by the law of the weights and biases as well as the density of activation thresholds in each bounded region of the input domain. We prove that these processes are isotropic and wide-sense self-similar with Hurst exponent $3/2$. We also derive a remarkably simple closed-form expression for their autocovariance function. Our results are fundamentally different from prior work in that we consider a non-asymptotic viewpoint: The number of neurons in each bounded region of the input domain (i.e., the width) is itself a random variable with a Poisson law with mean proportional to the density parameter. Finally, we show that, under suitable hypotheses, as the expected width tends to infinity, these processes can converge in law not only to Gaussian processes, but also to non-Gaussian processes depending on the law of the weights. Our asymptotic results provide a new take on several classical results (wide networks converge to Gaussian processes) as well as some new ones (wide networks can converge to non-Gaussian processes).

Related papers

Deep Kernel Posterior Learning under Infinite Variance Prior Weights [1.5960546024967326]
We show that a Bayesian deep neural network converges to a process with $alpha$-stable marginals in each layer that has a conditionally Gaussian representation. We also provide useful generalizations of the results of Lor'ia & Bhadra (2024) on shallow networks on shallow multi-layer networks. The computational and statistical benefits over competing approaches stand out in simulations and in demonstrations on benchmark data sets.
arXiv Detail & Related papers (2024-10-02T07:13:17Z)
Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection [11.729744197698718]
We present an algorithmic framework to approximate a neural network of finite width and depth. We iteratively approximate the output distribution of each layer of the neural network as a mixture of Gaussian processes. Our results can represent an important step towards understanding neural network predictions.
arXiv Detail & Related papers (2024-07-26T12:45:53Z)
Bayesian Circular Regression with von Mises Quasi-Processes [57.88921637944379]
In this work we explore a family of expressive and interpretable distributions over circle-valued random functions. For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Gibbs sampling. We present experiments applying this model to the prediction of wind directions and the percentage of the running gait cycle as a function of joint angles.
arXiv Detail & Related papers (2024-06-19T01:57:21Z)
Posterior Contraction Rates for Mat\'ern Gaussian Processes on Riemannian Manifolds [51.68005047958965]
We show that intrinsic Gaussian processes can achieve better performance in practice. Our work shows that finer-grained analyses are needed to distinguish between different levels of data-efficiency.
arXiv Detail & Related papers (2023-09-19T20:30:58Z)
Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks. We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z)
Score-based Diffusion Models in Function Space [140.792362459734]
Diffusion models have recently emerged as a powerful framework for generative modeling. We introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space. We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z)
Deep neural networks with dependent weights: Gaussian Process mixture limit, heavy tails, sparsity and compressibility [18.531464406721412]
This article studies the infinite-width limit of deep feedforward neural networks whose weights are dependent. Each hidden node of the network is assigned a nonnegative random variable that controls the variance of the outgoing weights of that node.
arXiv Detail & Related papers (2022-05-17T09:14:32Z)
Gaussian Processes and Statistical Decision-making in Non-Euclidean Spaces [96.53463532832939]
We develop techniques for broadening the applicability of Gaussian processes. We introduce a wide class of efficient approximations built from this viewpoint. We develop a collection of Gaussian process models over non-Euclidean spaces.
arXiv Detail & Related papers (2022-02-22T01:42:57Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
Sampling-free Variational Inference for Neural Networks with Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference. Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z)
Large-width functional asymptotics for deep Gaussian neural networks [2.7561479348365734]
We consider fully connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions. Our results contribute to recent theoretical studies on the interplay between infinitely wide deep neural networks and processes.
arXiv Detail & Related papers (2021-02-20T10:14:37Z)
Non-asymptotic approximations of neural networks by Gaussian processes [7.56714041729893]
We study the extent to which wide neural networks may be approximated by Gaussian processes when with random weights. As the width of a network goes to infinity, its law converges to that of a Gaussian process.
arXiv Detail & Related papers (2021-02-17T10:19:26Z)
Infinitely Wide Tensor Networks as Gaussian Process [1.7894377200944511]
In this paper, we show the equivalence of the infinitely wide Networks and the Gaussian Process. We implement the Gaussian Process corresponding to the infinite limit tensor networks and plot the sample paths of these models.
arXiv Detail & Related papers (2021-01-07T02:29:15Z)
Pathwise Conditioning of Gaussian Processes [72.61885354624604]
Conventional approaches for simulating Gaussian process posteriors view samples as draws from marginal distributions of process values at finite sets of input locations. This distribution-centric characterization leads to generative strategies that scale cubically in the size of the desired random vector. We show how this pathwise interpretation of conditioning gives rise to a general family of approximations that lend themselves to efficiently sampling Gaussian process posteriors.
arXiv Detail & Related papers (2020-11-08T17:09:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.