A Minimal Control Family of Dynamical Syetem for Universal Approximation
- URL: http://arxiv.org/abs/2312.12903v1
- Date: Wed, 20 Dec 2023 10:36:55 GMT
- Title: A Minimal Control Family of Dynamical Syetem for Universal Approximation
- Authors: Yifei Duan, Yongqiang Cai
- Abstract summary: We show that a control family can generate flow maps that can approximate diffeomorphisms of $mathbbRd$ on any compact domain.
Our result reveals an underlying connection between the approximation power of neural networks and control systems.
- Score: 6.164223149261533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The universal approximation property (UAP) of neural networks is a
fundamental characteristic of deep learning. It is widely recognized that a
composition of linear functions and non-linear functions, such as the rectified
linear unit (ReLU) activation function, can approximate continuous functions on
compact domains. In this paper, we extend this efficacy to the scenario of
dynamical systems with controls. We prove that the control family
$\mathcal{F}_1 = \mathcal{F}_0 \cup \{ \text{ReLU}(\cdot)\} $ is enough to
generate flow maps that can uniformly approximate diffeomorphisms of
$\mathbb{R}^d$ on any compact domain, where $\mathcal{F}_0 = \{x \mapsto Ax+b:
A\in \mathbb{R}^{d\times d}, b \in \mathbb{R}^d\}$ is the set of linear maps
and the dimension $d\ge2$. Since $\mathcal{F}_1$ contains only one nonlinear
function and $\mathcal{F}_0$ does not hold the UAP, we call $\mathcal{F}_1$ a
minimal control family for UAP. Based on this, some sufficient conditions, such
as the affine invariance, on the control family are established and discussed.
Our result reveals an underlying connection between the approximation power of
neural networks and control systems.
Related papers
- Representing Piecewise-Linear Functions by Functions with Minimal Arity [0.5266869303483376]
We show that the tessellation of the input space $mathbbRn$ induced by the function $F$ has a direct connection to the number of arguments in the $max$ functions.
arXiv Detail & Related papers (2024-06-04T15:39:08Z) - Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data.
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - Tractability of approximation by general shallow networks [0.0]
We consider approximation of functions of the form $ xmapstoint_mathbbY G( x, y)dtau( y)$, $ xinmathbbX$, by $G$-networks of the form $ xmapsto sum_k=1n a_kG( x, y_k)$.
We obtain independent dimension bounds on the degree of approximation in terms of $n$, where also the constants involved are all dependent on the dimensions.
arXiv Detail & Related papers (2023-08-07T00:14:46Z) - Minimum Width of Leaky-ReLU Neural Networks for Uniform Universal
Approximation [10.249623880822055]
This paper examines a uniform UAP for the function class $C(K,mathbbRd_y)$.
It gives the exact minimum width of the leaky-ReLU NN as $w_min=max(d_x,d_y)+Delta (d_x, d_y)$.
arXiv Detail & Related papers (2023-05-29T06:51:16Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Local approximation of operators [0.0]
We study the problem of determining the degree of approximation of a non-linear operator between metric spaces $mathfrakX$ and $mathfrakY$.
We establish constructive methods to do this efficiently, i.e., with the constants involved in the estimates on the approximation on $mathbbSd$ being $mathcalO(d1/6)$.
arXiv Detail & Related papers (2022-02-13T19:28:34Z) - On minimal representations of shallow ReLU networks [0.0]
We show that the minimal representation for $f$ uses either $n$, $n+1$ or $n+2$ neurons.
In particular, where the input layer is one-dimensional, minimal representations always use at most $n+1$ neurons but in all higher dimensional settings there are functions for which $n+2$ neurons are needed.
arXiv Detail & Related papers (2021-08-12T10:22:24Z) - Submodular + Concave [53.208470310734825]
It has been well established that first order optimization methods can converge to the maximal objective value of concave functions.
In this work, we initiate the determinant of the smooth functions convex body $$F(x) = G(x) +C(x)$.
This class of functions is an extension of both concave and continuous DR-submodular functions for which no guarantee is known.
arXiv Detail & Related papers (2021-06-09T01:59:55Z) - A deep network construction that adapts to intrinsic dimensionality
beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation.
We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z) - On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.