A closer look at the approximation capabilities of neural networks
- URL: http://arxiv.org/abs/2002.06505v1
- Date: Sun, 16 Feb 2020 04:58:43 GMT
- Title: A closer look at the approximation capabilities of neural networks
- Authors: Kai Fong Ernest Chong
- Abstract summary: A feedforward neural network with one hidden layer is able to approximate any continuous function $f$ to any given approximation threshold $varepsilon$.
We show that this uniform approximation property still holds even under seemingly strong conditions imposed on the weights.
- Score: 6.09170287691728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The universal approximation theorem, in one of its most general versions,
says that if we consider only continuous activation functions $\sigma$, then a
standard feedforward neural network with one hidden layer is able to
approximate any continuous multivariate function $f$ to any given approximation
threshold $\varepsilon$, if and only if $\sigma$ is non-polynomial. In this
paper, we give a direct algebraic proof of the theorem. Furthermore we shall
explicitly quantify the number of hidden units required for approximation.
Specifically, if $X\subseteq \mathbb{R}^n$ is compact, then a neural network
with $n$ input units, $m$ output units, and a single hidden layer with
$\binom{n+d}{d}$ hidden units (independent of $m$ and $\varepsilon$), can
uniformly approximate any polynomial function $f:X \to \mathbb{R}^m$ whose
total degree is at most $d$ for each of its $m$ coordinate functions. In the
general case that $f$ is any continuous function, we show there exists some
$N\in \mathcal{O}(\varepsilon^{-n})$ (independent of $m$), such that $N$ hidden
units would suffice to approximate $f$. We also show that this uniform
approximation property (UAP) still holds even under seemingly strong conditions
imposed on the weights. We highlight several consequences: (i) For any $\delta
> 0$, the UAP still holds if we restrict all non-bias weights $w$ in the last
layer to satisfy $|w| < \delta$. (ii) There exists some $\lambda>0$ (depending
only on $f$ and $\sigma$), such that the UAP still holds if we restrict all
non-bias weights $w$ in the first layer to satisfy $|w|>\lambda$. (iii) If the
non-bias weights in the first layer are \emph{fixed} and randomly chosen from a
suitable range, then the UAP holds with probability $1$.
Related papers
- LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy Attentions [54.54897832889028]
We show that for any $K$, there is a universal set" $U subset [n]$ of size independent of $n$, such that for any $Q$ and any row $i$, the large attention scores $A_i,j$ in row $i$ of $A$ all have $jin U$.
We empirically show the benefits of our scheme for vision transformers, showing how to train new models that use our universal set while training as well.
arXiv Detail & Related papers (2024-10-07T19:47:13Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Noncompact uniform universal approximation [0.0]
The universal approximation theorem is generalised to uniform convergence on the (noncompact) input space $mathbbRn$.
All continuous functions that vanish at infinity can be uniformly approximated by neural networks.
arXiv Detail & Related papers (2023-08-07T08:54:21Z) - The Approximate Degree of DNF and CNF Formulas [95.94432031144716]
For every $delta>0,$ we construct CNF and formulas of size with approximate degree $Omega(n1-delta),$ essentially matching the trivial upper bound of $n.
We show that for every $delta>0$, these models require $Omega(n1-delta)$, $Omega(n/4kk2)1-delta$, and $Omega(n/4kk2)1-delta$, respectively.
arXiv Detail & Related papers (2022-09-04T10:01:39Z) - On the Multidimensional Random Subset Sum Problem [0.9007371440329465]
In the Random Subset Sum Problem, given $n$ i.i.d. random variables $X_1,..., X_n$, we wish to approximate any point $z in [-1,1]$ as the sum of a subset $X_i_1(z),..., X_i_s(z)$ of them, up to error $varepsilon cdot.
We prove that, in $d$ dimensions, $n = O(d3log frac 1varepsilon cdot
arXiv Detail & Related papers (2022-07-28T08:10:43Z) - A general approximation lower bound in $L^p$ norm, with applications to
feed-forward neural networks [0.0]
We study the fundamental limits to the expressive power of neural networks.
We first prove a general lower bound on how well functions in $F$ can be approximated in $Lp(mu)$ norm.
We then instantiate this bound to the case where $G$ corresponds to a piecewise-polynomial feed-forward neural network.
arXiv Detail & Related papers (2022-06-09T09:01:05Z) - Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products [58.05771390012827]
We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm.
Our main result is an algorithm that uses only $tildeO(k/sqrtepsilon)$ matrix-vector products.
arXiv Detail & Related papers (2022-02-10T16:10:41Z) - Neural Network Approximation: Three Hidden Layers Are Enough [4.468952886990851]
A three-hidden-layer neural network with super approximation power is introduced.
Network is built with the floor function ($lfloor xrfloor$), the exponential function ($2x$), the step function ($1_xgeq 0$), or their compositions as the activation function in each neuron.
arXiv Detail & Related papers (2020-10-25T18:30:57Z) - Deep Network with Approximation Error Being Reciprocal of Width to Power
of Square Root of Depth [4.468952886990851]
A new network with super approximation power is introduced.
This network is built with Floor ($lfloor xrfloor$) or ReLU ($max0,x$) activation function in each neuron.
arXiv Detail & Related papers (2020-06-22T13:27:33Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - On the Complexity of Minimizing Convex Finite Sums Without Using the
Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model.
Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.