Approximation properties of neural ODEs
- URL: http://arxiv.org/abs/2503.15696v2
- Date: Tue, 30 Sep 2025 15:03:00 GMT
- Title: Approximation properties of neural ODEs
- Authors: Arturo De Marinis, Davide Murari, Elena Celledoni, Nicola Guglielmi, Brynjulf Owren, Francesco Tudisco,
- Abstract summary: We prove the universal approximation property (UAP) of shallow neural networks in the space of continuous functions.<n>In particular, we constrain the Lipschitz constant of the neural ODE's flow map and the norms of the weights to increase the network's stability.
- Score: 5.828989070109041
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the approximation properties of shallow neural networks whose activation function is defined as the flow map of a neural ordinary differential equation (neural ODE) at the final time of the integration interval. We prove the universal approximation property (UAP) of such shallow neural networks in the space of continuous functions. Furthermore, we investigate the approximation properties of shallow neural networks whose parameters satisfy specific constraints. In particular, we constrain the Lipschitz constant of the neural ODE's flow map and the norms of the weights to increase the network's stability. We prove that the UAP holds if we consider either constraint independently. When both are enforced, there is a loss of expressiveness, and we derive approximation bounds that quantify how accurately such a constrained network can approximate a continuous function.
Related papers
- Dense Neural Networks are not Universal Approximators [53.27010448621372]
We show that dense neural networks do not possess universality of arbitrary continuous functions.<n>We consider ReLU neural networks subject to natural constraints on weights and input and output dimensions.
arXiv Detail & Related papers (2026-02-07T16:52:38Z) - Quantitative Flow Approximation Properties of Narrow Neural ODEs [0.5439020425819]
We revisit the problem of flow approximation properties of neural ordinary differential equations (NODEs)<n>We derive the relation of narrow NODEs in approximating flows of shallow but wide NODEs.<n>We provide an estimate on the number of switches needed for the time dependent weights of the narrow NODE to mimic the behavior of a NODE with a single layer wide neural network as the velocity field.
arXiv Detail & Related papers (2025-03-06T03:54:42Z) - Approximation Error and Complexity Bounds for ReLU Networks on Low-Regular Function Spaces [0.0]
We consider the approximation of a large class of bounded functions, with minimal regularity assumptions, by ReLU neural networks.
We show that the approximation error can be bounded from above by a quantity proportional to the uniform norm of the target function.
arXiv Detail & Related papers (2024-05-10T14:31:58Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Lipschitz constant estimation for 1D convolutional neural networks [0.0]
We propose a dissipativity-based method for Lipschitz constant estimation of 1D convolutional neural networks (CNNs)
In particular, we analyze the dissipativity properties of convolutional, pooling, and fully connected layers.
arXiv Detail & Related papers (2022-11-28T12:09:06Z) - A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer
Neural Networks [49.870593940818715]
We study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed.
Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors.
arXiv Detail & Related papers (2022-10-28T17:26:27Z) - Sobolev-type embeddings for neural network approximation spaces [5.863264019032882]
We consider neural network approximation spaces that classify functions according to the rate at which they can be approximated.
We prove embedding theorems between these spaces for different values of $p$.
We find that, analogous to the case of classical function spaces, it is possible to trade "smoothness" (i.e., approximation rate) for increased integrability.
arXiv Detail & Related papers (2021-10-28T17:11:38Z) - Linear approximability of two-layer neural networks: A comprehensive
analysis based on spectral decay [4.042159113348107]
We first consider the case of single neuron and show that the linear approximability, quantified by the Kolmogorov width, is controlled by the eigenvalue decay of an associate kernel.
We show that similar results also hold for two-layer neural networks.
arXiv Detail & Related papers (2021-08-10T23:30:29Z) - Sharp Lower Bounds on the Approximation Rate of Shallow Neural Networks [0.0]
We prove sharp lower bounds on the approximation rates for shallow neural networks.
These lower bounds apply to both sigmoidal activation functions with bounded variation and to activation functions which are a power of the ReLU.
arXiv Detail & Related papers (2021-06-28T22:01:42Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - The Representation Power of Neural Networks: Breaking the Curse of
Dimensionality [0.0]
We prove upper bounds on quantities for shallow and deep neural networks.
We further prove that these bounds nearly match the minimal number of parameters any continuous function approximator needs to approximate Korobov functions.
arXiv Detail & Related papers (2020-12-10T04:44:07Z) - Measuring Model Complexity of Neural Networks with Curve Activation
Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function.
We experimentally explore the training process of neural networks and detect overfitting.
We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z) - Deep Neural Networks with Trainable Activations and Controlled Lipschitz
Constant [26.22495169129119]
We introduce a variational framework to learn the activation functions of deep neural networks.
Our aim is to increase the capacity of the network while controlling an upper-bound of the Lipschitz constant.
We numerically compare our scheme with standard ReLU network and its variations, PReLU and LeakyReLU.
arXiv Detail & Related papers (2020-01-17T12:32:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.