Related papers: On the Approximation Power of SiLU Networks: Exponential Rates and Depth Efficiency

On the Approximation Power of SiLU Networks: Exponential Rates and Depth Efficiency

URL: http://arxiv.org/abs/2512.12132v1
Date: Sat, 13 Dec 2025 01:56:34 GMT
Title: On the Approximation Power of SiLU Networks: Exponential Rates and Depth Efficiency
Authors: Koffi O. Ayena,
Abstract summary: This article establishes a comprehensive theoretical framework demonstrating that SiLU activation networks achieve exponential approximation rates for smooth functions.<n>We develop a novel hierarchical construction beginning with an efficient approximation of the square function $x2$ more compact in depth and size than comparable ReLU realizations.<n>We then extend this approach through functional composition to establish sharp approximation bounds for deep SiLU networks in approximating Sobolev-class functions.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This article establishes a comprehensive theoretical framework demonstrating that SiLU (Sigmoid Linear Unit) activation networks achieve exponential approximation rates for smooth functions with explicit and improved complexity control compared to classical ReLU-based constructions. We develop a novel hierarchical construction beginning with an efficient approximation of the square function $x^2$ more compact in depth and size than comparable ReLU realizations, such as those given by Yarotsky. This construction yields an approximation error decaying as $\mathcal{O}(ω^{-2k})$ using networks of depth $\mathcal{O}(1)$. We then extend this approach through functional composition to establish sharp approximation bounds for deep SiLU networks in approximating Sobolev-class functions, with total depth $\mathcal{O}(1)$ and size $\mathcal{O}(\varepsilon^{-d/n})$.

Related papers

Convergence Analysis of the PAGE Stochastic Algorithm for Weakly Convex Finite-Sum Optimization [56.57092765118707]
The algorithm was designed to find stationary points of averages of work of this type.<n>It provides a continuous non-smooth case ($tautauL$) which improves between the general framework ($tautauL$) and the rate of change.
arXiv Detail & Related papers (2025-08-31T08:06:53Z)
Approximation Rates in Besov Norms and Sample-Complexity of Kolmogorov-Arnold Networks with Residual Connections [9.817834520159936]
Kolmogorov-Arnold Networks (KANs) have emerged as an improved backbone for most deep learning frameworks.<n>We show that KANs can optimally approximate any Besov function in $Bs_p,q(mathcalX)$ on a bounded open, or even fractal, domain.
arXiv Detail & Related papers (2025-04-21T14:02:59Z)
On the Convergence of Single-Timescale Actor-Critic [49.19842488693726]
We analyze the global convergence of the single-timescale actor-critic (AC) algorithm for the infinite-horizon discounted Decision Processes (MDs) with finite state spaces.<n>We demonstrate that the step sizes for both the actor and critic must decay as ( O(k-Pfrac12) ) with $k$ diverging from the conventional ( O(k-Pfrac12) ) rates commonly used in (non- optimal) Markov framework optimization.
arXiv Detail & Related papers (2024-10-11T14:46:29Z)
A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks. We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks. Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z)
Functional SDE approximation inspired by a deep operator network architecture [0.0]
A novel approach to approximate solutions of Differential Equations (SDEs) by Deep Neural Networks is derived and analysed.<n>The architecture is inspired by notion of Deep Operator Networks (DeepONets), which is based on operator learning in terms of a reduced basis also represented in the network.<n>The proposed SDEONet architecture aims to alleviate the issue of exponential complexity by learning an optimal sparse truncation of the Wiener chaos expansion.
arXiv Detail & Related papers (2024-02-05T14:12:35Z)
Revisiting Subgradient Method: Complexity and Convergence Beyond Lipschitz Continuity [24.45688490844496]
Subgradient method is one of the most fundamental algorithmic schemes for nonsmooth optimization. In this work, we first extend the typical iteration complexity results for the subgradient method to cover non-Lipschitz convex and weakly convex minimization.
arXiv Detail & Related papers (2023-05-23T15:26:36Z)
Optimal Approximation Complexity of High-Dimensional Functions with Neural Networks [3.222802562733787]
We investigate properties of neural networks that use both ReLU and $x2$ as activation functions. We show how to leverage low local dimensionality in some contexts to overcome the curse of dimensionality, obtaining approximation rates that are optimal for unknown lower-dimensional subspaces.
arXiv Detail & Related papers (2023-01-30T17:29:19Z)
Exponential Separations in Symmetric Neural Networks [48.80300074254758]
We consider symmetric Networkparencitesantoro 2017simple architecture as a natural generalization of DeepSetsparencitezaheerdeep architecture. Under the restriction to analytic activation functions, we construct a symmetric function acting on sets of dimensions $N$ in dimension with $D$.
arXiv Detail & Related papers (2022-06-02T19:45:10Z)
Quantitative approximation results for complex-valued neural networks [0.0]
We show that complex-valued neural networks with the modReLU activation function $sigma(z) = mathrmReLU(|z|) can uniformly approximate complex-valued functions of regularity $Cn$ on compact subsets of $mathbbCd$, giving explicit bounds on the approximation rate.
arXiv Detail & Related papers (2021-02-25T18:57:58Z)
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning. In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function. Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z)
Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension [124.7752517531109]
We establish a provably efficient reinforcement learning algorithm with general value function approximation. We show that our algorithm achieves a regret bound of $widetildeO(mathrmpoly(dH)sqrtT)$ where $d$ is a complexity measure. Our theory generalizes recent progress on RL with linear value function approximation and does not make explicit assumptions on the model of the environment.
arXiv Detail & Related papers (2020-05-21T17:36:09Z)
Deep Network Approximation for Smooth Functions [9.305095040004156]
We show that deep ReLU networks of width $mathcalO(Nln N)$ and depth $mathcalO(L L)$ can approximate $fin Cs([0,1]d)$ with a nearly optimal approximation error. Our estimate is non-asymptotic in the sense that it is valid for arbitrary width and depth specified by $NinmathbbN+$ and $LinmathbbN+$, respectively.
arXiv Detail & Related papers (2020-01-09T15:06:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.