Universal Approximation Power of Deep Residual Neural Networks via
Nonlinear Control Theory
- URL: http://arxiv.org/abs/2007.06007v4
- Date: Fri, 9 Feb 2024 16:13:20 GMT
- Title: Universal Approximation Power of Deep Residual Neural Networks via
Nonlinear Control Theory
- Authors: Paulo Tabuada and Bahman Gharesifard
- Abstract summary: We explain the universal approximation capabilities of deep residual neural networks through geometric nonlinear control.
Inspired by recent work establishing links between residual networks and control systems, we provide a general sufficient condition for a residual network to have the power of universal approximation.
- Score: 9.210074587720172
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we explain the universal approximation capabilities of deep
residual neural networks through geometric nonlinear control. Inspired by
recent work establishing links between residual networks and control systems,
we provide a general sufficient condition for a residual network to have the
power of universal approximation by asking the activation function, or one of
its derivatives, to satisfy a quadratic differential equation. Many activation
functions used in practice satisfy this assumption, exactly or approximately,
and we show this property to be sufficient for an adequately deep neural
network with $n+1$ neurons per layer to approximate arbitrarily well, on a
compact set and with respect to the supremum norm, any continuous function from
$\mathbb{R}^n$ to $\mathbb{R}^n$. We further show this result to hold for very
simple architectures for which the weights only need to assume two values. The
first key technical contribution consists of relating the universal
approximation problem to controllability of an ensemble of control systems
corresponding to a residual network and to leverage classical Lie algebraic
techniques to characterize controllability. The second technical contribution
is to identify monotonicity as the bridge between controllability of finite
ensembles and uniform approximability on compact sets.
Related papers
- Approximation Error and Complexity Bounds for ReLU Networks on Low-Regular Function Spaces [0.0]
We consider the approximation of a large class of bounded functions, with minimal regularity assumptions, by ReLU neural networks.
We show that the approximation error can be bounded from above by a quantity proportional to the uniform norm of the target function.
arXiv Detail & Related papers (2024-05-10T14:31:58Z) - Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks.
Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z) - Interpolation, Approximation and Controllability of Deep Neural Networks [18.311411538309425]
We consider two properties that arise from supervised learning, namely universal and universal approximation.
We give a characterisation of universal equivalence, showing that it holds for essentially any architecture with non-linearity.
arXiv Detail & Related papers (2023-09-12T07:29:47Z) - A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer
Neural Networks [49.870593940818715]
We study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed.
Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors.
arXiv Detail & Related papers (2022-10-28T17:26:27Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - The Sample Complexity of One-Hidden-Layer Neural Networks [57.6421258363243]
We study a class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm.
We prove that controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees.
We analyze two important settings where a mere spectral norm control turns out to be sufficient.
arXiv Detail & Related papers (2022-02-13T07:12:02Z) - Linear approximability of two-layer neural networks: A comprehensive
analysis based on spectral decay [4.042159113348107]
We first consider the case of single neuron and show that the linear approximability, quantified by the Kolmogorov width, is controlled by the eigenvalue decay of an associate kernel.
We show that similar results also hold for two-layer neural networks.
arXiv Detail & Related papers (2021-08-10T23:30:29Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Complexity Measures for Neural Networks with General Activation
Functions Using Path-based Norms [18.487936403345167]
A simple approach is proposed to obtain complexity controls for neural networks with general activation functions.
We consider two-layer networks and deep residual networks, for which path-based norms are derived to control complexities.
arXiv Detail & Related papers (2020-09-14T01:15:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.