Random ReLU Neural Networks as Non-Gaussian Processes
- URL: http://arxiv.org/abs/2405.10229v1
- Date: Thu, 16 May 2024 16:28:11 GMT
- Title: Random ReLU Neural Networks as Non-Gaussian Processes
- Authors: Rahul Parhi, Pakshal Bohra, Ayoub El Biari, Mehrsa Pourya, Michael Unser,
- Abstract summary: We show that random neural networks with rectified linear unit activation functions are well-defined non-Gaussian processes.
As a by-product, we demonstrate that these networks are solutions to differential equations driven by impulsive white noise.
- Score: 20.607307985674428
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a large class of shallow neural networks with randomly initialized parameters and rectified linear unit activation functions. We prove that these random neural networks are well-defined non-Gaussian processes. As a by-product, we demonstrate that these networks are solutions to stochastic differential equations driven by impulsive white noise (combinations of random Dirac measures). These processes are parameterized by the law of the weights and biases as well as the density of activation thresholds in each bounded region of the input domain. We prove that these processes are isotropic and wide-sense self-similar with Hurst exponent $3/2$. We also derive a remarkably simple closed-form expression for their autocovariance function. Our results are fundamentally different from prior work in that we consider a non-asymptotic viewpoint: The number of neurons in each bounded region of the input domain (i.e., the width) is itself a random variable with a Poisson law with mean proportional to the density parameter. Finally, we show that, under suitable hypotheses, as the expected width tends to infinity, these processes can converge in law not only to Gaussian processes, but also to non-Gaussian processes depending on the law of the weights. Our asymptotic results provide a new take on several classical results (wide networks converge to Gaussian processes) as well as some new ones (wide networks can converge to non-Gaussian processes).
Related papers
- Deep Kernel Posterior Learning under Infinite Variance Prior Weights [1.5960546024967326]
We show that a Bayesian deep neural network converges to a process with $alpha$-stable marginals in each layer that has a conditionally Gaussian representation.
We also provide useful generalizations of the results of Lor'ia & Bhadra (2024) on shallow networks on shallow multi-layer networks.
The computational and statistical benefits over competing approaches stand out in simulations and in demonstrations on benchmark data sets.
arXiv Detail & Related papers (2024-10-02T07:13:17Z) - Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection [11.729744197698718]
We present an algorithmic framework to approximate a neural network of finite width and depth.
We iteratively approximate the output distribution of each layer of the neural network as a mixture of Gaussian processes.
Our results can represent an important step towards understanding neural network predictions.
arXiv Detail & Related papers (2024-07-26T12:45:53Z) - Posterior Contraction Rates for Mat\'ern Gaussian Processes on
Riemannian Manifolds [51.68005047958965]
We show that intrinsic Gaussian processes can achieve better performance in practice.
Our work shows that finer-grained analyses are needed to distinguish between different levels of data-efficiency.
arXiv Detail & Related papers (2023-09-19T20:30:58Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Gaussian Processes and Statistical Decision-making in Non-Euclidean
Spaces [96.53463532832939]
We develop techniques for broadening the applicability of Gaussian processes.
We introduce a wide class of efficient approximations built from this viewpoint.
We develop a collection of Gaussian process models over non-Euclidean spaces.
arXiv Detail & Related papers (2022-02-22T01:42:57Z) - Sampling-free Variational Inference for Neural Networks with
Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference.
Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z) - Large-width functional asymptotics for deep Gaussian neural networks [2.7561479348365734]
We consider fully connected feed-forward deep neural networks where weights and biases are independent and identically distributed according to Gaussian distributions.
Our results contribute to recent theoretical studies on the interplay between infinitely wide deep neural networks and processes.
arXiv Detail & Related papers (2021-02-20T10:14:37Z) - Non-asymptotic approximations of neural networks by Gaussian processes [7.56714041729893]
We study the extent to which wide neural networks may be approximated by Gaussian processes when with random weights.
As the width of a network goes to infinity, its law converges to that of a Gaussian process.
arXiv Detail & Related papers (2021-02-17T10:19:26Z) - Infinitely Wide Tensor Networks as Gaussian Process [1.7894377200944511]
In this paper, we show the equivalence of the infinitely wide Networks and the Gaussian Process.
We implement the Gaussian Process corresponding to the infinite limit tensor networks and plot the sample paths of these models.
arXiv Detail & Related papers (2021-01-07T02:29:15Z) - Pathwise Conditioning of Gaussian Processes [72.61885354624604]
Conventional approaches for simulating Gaussian process posteriors view samples as draws from marginal distributions of process values at finite sets of input locations.
This distribution-centric characterization leads to generative strategies that scale cubically in the size of the desired random vector.
We show how this pathwise interpretation of conditioning gives rise to a general family of approximations that lend themselves to efficiently sampling Gaussian process posteriors.
arXiv Detail & Related papers (2020-11-08T17:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.