Related papers: Approximation Capabilities of Feedforward Neural Networks with GELU Activations

Approximation Capabilities of Feedforward Neural Networks with GELU Activations

URL: http://arxiv.org/abs/2512.21749v1
Date: Thu, 25 Dec 2025 17:56:44 GMT
Title: Approximation Capabilities of Feedforward Neural Networks with GELU Activations
Authors: Konstantin Yakovlev, Nikita Puchkin,
Abstract summary: We derive an approximation error bound that holds simultaneously for a function and all its derivatives up to any prescribed order.<n>The bounds apply to elementary functions, including multivariates, the exponential function, and the reciprocal function.<n>We report the network size, weight magnitudes, and behavior at infinity.
Score: 6.488575826304024
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We derive an approximation error bound that holds simultaneously for a function and all its derivatives up to any prescribed order. The bounds apply to elementary functions, including multivariate polynomials, the exponential function, and the reciprocal function, and are obtained using feedforward neural networks with the Gaussian Error Linear Unit (GELU) activation. In addition, we report the network size, weight magnitudes, and behavior at infinity. Our analysis begins with a constructive approximation of multiplication, where we prove the simultaneous validity of error bounds over domains of increasing size for a given approximator. Leveraging this result, we obtain approximation guarantees for division and the exponential function, ensuring that all higher-order derivatives of the resulting approximators remain globally bounded.

Related papers

Convergence Analysis of Max-Min Exponential Neural Network Operators in Orlicz Space [0.0]
We propose a Max Min approach for approximating functions using exponential neural network operators.<n>We study both pointwise and uniform convergence for univariate functions.<n>We provide some graphical representations to illustrate the approximation error of the function through suitable kernel and sigmoidal activation functions.
arXiv Detail & Related papers (2025-08-14T00:30:56Z)
Approximation Error and Complexity Bounds for ReLU Networks on Low-Regular Function Spaces [0.0]
We consider the approximation of a large class of bounded functions, with minimal regularity assumptions, by ReLU neural networks. We show that the approximation error can be bounded from above by a quantity proportional to the uniform norm of the target function.
arXiv Detail & Related papers (2024-05-10T14:31:58Z)
Multi-Grid Tensorized Fourier Neural Operator for High-Resolution PDEs [93.82811501035569]
We introduce a new data efficient and highly parallelizable operator learning approach with reduced memory requirement and better generalization. MG-TFNO scales to large resolutions by leveraging local and global structures of full-scale, real-world phenomena. We demonstrate superior performance on the turbulent Navier-Stokes equations where we achieve less than half the error with over 150x compression.
arXiv Detail & Related papers (2023-09-29T20:18:52Z)
Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks. We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z)
Convex Bounds on the Softmax Function with Applications to Robustness Verification [69.09991317119679]
The softmax function is a ubiquitous component at the output of neural networks and increasingly in intermediate layers as well. This paper provides convex lower bounds and concave upper bounds on the softmax function, which are compatible with convex optimization formulations for characterizing neural networks and other ML models.
arXiv Detail & Related papers (2023-03-03T05:07:02Z)
Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions. We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z)
Non-asymptotic approximations of neural networks by Gaussian processes [7.56714041729893]
We study the extent to which wide neural networks may be approximated by Gaussian processes when with random weights. As the width of a network goes to infinity, its law converges to that of a Gaussian process.
arXiv Detail & Related papers (2021-02-17T10:19:26Z)
Approximation with Neural Networks in Variable Lebesgue Spaces [1.0152838128195465]
This paper concerns the universal approximation property with neural networks in variable Lebesgue spaces. We show that, whenever the exponent function of the space is bounded, every function can be approximated with shallow neural networks with any desired accuracy.
arXiv Detail & Related papers (2020-07-08T14:52:48Z)
Multipole Graph Neural Operator for Parametric Partial Differential Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data. We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity. Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)
On Sharpness of Error Bounds for Multivariate Neural Network Approximation [0.0]
The paper deals with best non-linear approximation by such sums of ridge functions. Error bounds are presented in terms of moduli of smoothness.
arXiv Detail & Related papers (2020-04-05T14:00:52Z)
SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features. We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.