A Universal Approximation Theorem of Deep Neural Networks for Expressing
Probability Distributions
- URL: http://arxiv.org/abs/2004.08867v3
- Date: Mon, 16 Nov 2020 04:29:47 GMT
- Title: A Universal Approximation Theorem of Deep Neural Networks for Expressing
Probability Distributions
- Authors: Yulong Lu and Jianfeng Lu
- Abstract summary: We prove that there exists a deep neural network $g:mathbbRdrightarrow mathbbR$ with ReLU activation.
The size of neural network can grow exponentially in $d$ when $1$-Wasserstein distance is used as the discrepancy.
- Score: 12.100913944042972
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the universal approximation property of deep neural
networks for representing probability distributions. Given a target
distribution $\pi$ and a source distribution $p_z$ both defined on
$\mathbb{R}^d$, we prove under some assumptions that there exists a deep neural
network $g:\mathbb{R}^d\rightarrow \mathbb{R}$ with ReLU activation such that
the push-forward measure $(\nabla g)_\# p_z$ of $p_z$ under the map $\nabla g$
is arbitrarily close to the target measure $\pi$. The closeness are measured by
three classes of integral probability metrics between probability
distributions: $1$-Wasserstein distance, maximum mean distance (MMD) and
kernelized Stein discrepancy (KSD). We prove upper bounds for the size (width
and depth) of the deep neural network in terms of the dimension $d$ and the
approximation error $\varepsilon$ with respect to the three discrepancies. In
particular, the size of neural network can grow exponentially in $d$ when
$1$-Wasserstein distance is used as the discrepancy, whereas for both MMD and
KSD the size of neural network only depends on $d$ at most polynomially. Our
proof relies on convergence estimates of empirical measures under
aforementioned discrepancies and semi-discrete optimal transport.
Related papers
- New advances in universal approximation with neural networks of minimal width [4.424170214926035]
We show that autoencoders with leaky ReLU activations are universal approximators of $Lp$ functions.
We broaden our results to show that smooth invertible neural networks can approximate $Lp(mathbbRd,mathbbRd)$ on compacta.
arXiv Detail & Related papers (2024-11-13T16:17:16Z) - Relative-Translation Invariant Wasserstein Distance [82.6068808353647]
We introduce a new family of distances, relative-translation invariant Wasserstein distances ($RW_p$)
We show that $RW_p distances are also real distance metrics defined on the quotient set $mathcalP_p(mathbbRn)/sim$ invariant to distribution translations.
arXiv Detail & Related papers (2024-09-04T03:41:44Z) - A Theory of Interpretable Approximations [61.90216959710842]
We study the idea of approximating a target concept $c$ by a small aggregation of concepts from some base class $mathcalH$.
For any given pair of $mathcalH$ and $c$, exactly one of these cases holds: (i) $c$ cannot be approximated by $mathcalH$ with arbitrary accuracy.
We show that, in the case of interpretable approximations, even a slightly nontrivial a-priori guarantee on the complexity of approximations implies approximations with constant (distribution-free and accuracy-
arXiv Detail & Related papers (2024-06-15T06:43:45Z) - Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable.
We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z) - Sample Complexity of Neural Policy Mirror Descent for Policy
Optimization on Low-Dimensional Manifolds [75.51968172401394]
We study the sample complexity of the neural policy mirror descent (NPMD) algorithm with deep convolutional neural networks (CNN)
In each iteration of NPMD, both the value function and the policy can be well approximated by CNNs.
We show that NPMD can leverage the low-dimensional structure of state space to escape from the curse of dimensionality.
arXiv Detail & Related papers (2023-09-25T07:31:22Z) - Quantitative CLTs in Deep Neural Networks [12.845031126178593]
We study the distribution of a fully connected neural network with random Gaussian weights and biases.
We obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth.
Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature.
arXiv Detail & Related papers (2023-07-12T11:35:37Z) - Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures.
This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z) - Overparametrized linear dimensionality reductions: From projection
pursuit to two-layer neural networks [10.368585938419619]
Given a cloud of $n$ data points in $mathbbRd$, consider all projections onto $m$-dimensional subspaces of $mathbbRd$.
What does this collection of probability distributions look like when $n,d$ grow large?
Denoting by $mathscrF_m, alpha$ the set of probability distributions in $mathbbRm$ that arise as low-dimensional projections in this limit, we establish new inner and outer bounds on $mathscrF_
arXiv Detail & Related papers (2022-06-14T00:07:33Z) - A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions.
We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$.
We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z) - Fundamental tradeoffs between memorization and robustness in random
features and neural tangent regimes [15.76663241036412]
We prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded.
Experiments reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
arXiv Detail & Related papers (2021-06-04T17:52:50Z) - Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization
is Sufficient [9.309655246559094]
We show that any target network of width $d$ and depth $l$ can be approximated by pruning a random network that is a factor $O(log(dl))$ wider and twice as deep.
Our analysis relies on connecting pruning random ReLU networks to random instances of the textscSubset problem.
arXiv Detail & Related papers (2020-06-14T19:32:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.