Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights
- URL: http://arxiv.org/abs/2507.12686v1
- Date: Wed, 16 Jul 2025 23:41:09 GMT
- Title: Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights
- Authors: Krishnakumar Balasubramanian, Nathan Ross,
- Abstract summary: We study the Finite-Dimensional Distributions (FDDs) of deep neural networks with randomly weights that have finite-order moments.<n>We establish Gaussian approximation bounds in the Wasserstein-$1$ norm between the FDDs and their Gaussian limit.<n>In the special case where all widths are proportional to a common scale parameter $n$ and there are $L-1$ hidden layers, we obtain convergence rates of order $n-(1/6)L-1 + epsilon$, for any $epsilon > 0$.
- Score: 15.424946932398713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the Finite-Dimensional Distributions (FDDs) of deep neural networks with randomly initialized weights that have finite-order moments. Specifically, we establish Gaussian approximation bounds in the Wasserstein-$1$ norm between the FDDs and their Gaussian limit assuming a Lipschitz activation function and allowing the layer widths to grow to infinity at arbitrary relative rates. In the special case where all widths are proportional to a common scale parameter $n$ and there are $L-1$ hidden layers, we obtain convergence rates of order $n^{-({1}/{6})^{L-1} + \epsilon}$, for any $\epsilon > 0$.
Related papers
- Approximation and Generalization Abilities of Score-based Neural Network Generative Models for Sub-Gaussian Distributions [18.375250624200373]
We study the approximation and abilities of score-based neural network generative models (SGMs)<n>Our framework is universal and can be used to establish convergence rates for SGMs under milder assumptions than previous work.<n>Our analysis removes several crucial assumptions, such as Lipschitz continuity of the score function or strictly positive lower bound on the target density.
arXiv Detail & Related papers (2025-05-16T05:38:28Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Measurement-induced phase transition for free fermions above one dimension [46.176861415532095]
Theory of the measurement-induced entanglement phase transition for free-fermion models in $d>1$ dimensions is developed.
Critical point separates a gapless phase with $elld-1 ln ell$ scaling of the second cumulant of the particle number and of the entanglement entropy.
arXiv Detail & Related papers (2023-09-21T18:11:04Z) - Quantitative CLTs in Deep Neural Networks [12.845031126178593]
We study the distribution of a fully connected neural network with random Gaussian weights and biases.
We obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth.
Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature.
arXiv Detail & Related papers (2023-07-12T11:35:37Z) - Wide neural networks: From non-gaussian random fields at initialization
to the NTK geometry of training [0.0]
Recent developments in applications of artificial neural networks with over $n=1014$ parameters make it extremely important to study the large $n$ behaviour of such networks.
Most works studying wide neural networks have focused on the infinite width $n to +infty$ limit of such networks.
In this work we will study their behavior for large, but finite $n$.
arXiv Detail & Related papers (2023-04-06T21:34:13Z) - A Nearly Tight Bound for Fitting an Ellipsoid to Gaussian Random Points [50.90125395570797]
This nearly establishes a conjecture ofciteSaundersonCPW12, within logarithmic factors.
The latter conjecture has attracted significant attention over the past decade, due to its connections to machine learning and sum-of-squares lower bounds for certain statistical problems.
arXiv Detail & Related papers (2022-12-21T17:48:01Z) - A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions.
We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$.
We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - Convergence of Langevin Monte Carlo in Chi-Squared and Renyi Divergence [8.873449722727026]
We show that the rate estimate $widetildemathcalO(depsilon-1)$ improves the previously known rates in both of these metrics.
In particular, for convex and firstorder smooth potentials, we show that LMC algorithm achieves the rate estimate $widetildemathcalO(depsilon-1)$ which improves the previously known rates in both of these metrics.
arXiv Detail & Related papers (2020-07-22T18:18:28Z) - A Universal Approximation Theorem of Deep Neural Networks for Expressing
Probability Distributions [12.100913944042972]
We prove that there exists a deep neural network $g:mathbbRdrightarrow mathbbR$ with ReLU activation.
The size of neural network can grow exponentially in $d$ when $1$-Wasserstein distance is used as the discrepancy.
arXiv Detail & Related papers (2020-04-19T14:45:47Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.