Universal Approximation in Dropout Neural Networks
- URL: http://arxiv.org/abs/2012.10351v1
- Date: Fri, 18 Dec 2020 16:53:15 GMT
- Title: Universal Approximation in Dropout Neural Networks
- Authors: Oxana A. Manita, Mark A. Peletier, Jacobus W. Portegies, Jaron
Sanders, Albert Senen-Cerda
- Abstract summary: We prove two universal approximation theorems for a range of dropout neural networks.
In the first each edge output is multiplied by its random filter, resulting in a random output.
In the second each edge output is multiplied by the expectation of its filter, leading to a deterministic output.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We prove two universal approximation theorems for a range of dropout neural
networks. These are feed-forward neural networks in which each edge is given a
random $\{0,1\}$-valued filter, that have two modes of operation: in the first
each edge output is multiplied by its random filter, resulting in a random
output, while in the second each edge output is multiplied by the expectation
of its filter, leading to a deterministic output. It is common to use the
random mode during training and the deterministic mode during testing and
prediction.
Both theorems are of the following form: Given a function to approximate and
a threshold $\varepsilon>0$, there exists a dropout network that is
$\varepsilon$-close in probability and in $L^q$. The first theorem applies to
dropout networks in the random mode. It assumes little on the activation
function, applies to a wide class of networks, and can even be applied to
approximation schemes other than neural networks. The core is an algebraic
property that shows that deterministic networks can be exactly matched in
expectation by random networks. The second theorem makes stronger assumptions
and gives a stronger result. Given a function to approximate, it provides
existence of a network that approximates in both modes simultaneously. Proof
components are a recursive replacement of edges by independent copies, and a
special first-layer replacement that couples the resulting larger network to
the input.
The functions to be approximated are assumed to be elements of general normed
spaces, and the approximations are measured in the corresponding norms. The
networks are constructed explicitly. Because of the different methods of proof,
the two results give independent insight into the approximation properties of
random dropout networks. With this, we establish that dropout neural networks
broadly satisfy a universal-approximation property.
Related papers
- Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Multi-layer random features and the approximation power of neural networks [4.178980693837599]
We prove that a reproducing kernel Hilbert space contains only functions that can be approximated by the architecture.
We show that if eigenvalues of the integral operator of the NNGP decay slower than $k-n-frac23$ where $k$ is an order of an eigenvalue, our theorem guarantees a more succinct neural network approximation than Barron's theorem.
arXiv Detail & Related papers (2024-04-26T14:57:56Z) - Approximation with Random Shallow ReLU Networks with Applications to Model Reference Adaptive Control [0.0]
We show that ReLU networks with randomly generated weights and biases achieve $L_infty$ error of $O(m-1/2)$ with high probability.
We show how the result can be used to get approximations of required accuracy in a model reference adaptive control application.
arXiv Detail & Related papers (2024-03-25T19:39:17Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Finding Everything within Random Binary Networks [11.689913953698081]
We prove that a random network can be approximated up to arbitrary accuracy by simply pruning a random network of binary $pm1$ weights.
We prove that any target network can be approximated up to arbitrary accuracy by simply pruning a random network of binary $pm1$ weights that is only a polylogarithmic factor wider and deeper than the target network.
arXiv Detail & Related papers (2021-10-18T03:19:25Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - How Powerful are Shallow Neural Networks with Bandlimited Random
Weights? [25.102870584507244]
We investigate the expressive power of limited depth-2 band random neural networks.
A random net is a neural network where the hidden layer parameters are frozen with random bandwidth.
arXiv Detail & Related papers (2020-08-19T13:26:12Z) - Random Vector Functional Link Networks for Function Approximation on Manifolds [8.535815777849786]
We show that single layer neural-networks with random input-to-hidden layer weights and biases have seen success in practice.
We further adapt this randomized neural network architecture to approximate functions on smooth, compact submanifolds of Euclidean space.
arXiv Detail & Related papers (2020-07-30T23:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.