Related papers: Integral Signatures of Activation Functions: A 9-Dimensional Taxonomy and Stability Theory for Deep Learning

Integral Signatures of Activation Functions: A 9-Dimensional Taxonomy and Stability Theory for Deep Learning

URL: http://arxiv.org/abs/2510.08456v1
Date: Thu, 09 Oct 2025 17:03:00 GMT
Title: Integral Signatures of Activation Functions: A 9-Dimensional Taxonomy and Stability Theory for Deep Learning
Authors: Ankur Mali, Lawrence Hall, Jake Williams, Gordon Richards,
Abstract summary: Activation functions govern the expressivity and stability of neural networks.<n>We propose a rigorous framework for their classification via a nine-dimensional integral signature S_sigma(phi)<n>Our framework provides principled design guidance, moving activation choice from trial-and-error to provable stability and kernel conditioning.
Score: 0.22399170518036912
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Activation functions govern the expressivity and stability of neural networks, yet existing comparisons remain largely heuristic. We propose a rigorous framework for their classification via a nine-dimensional integral signature S_sigma(phi), combining Gaussian propagation statistics (m1, g1, g2, m2, eta), asymptotic slopes (alpha_plus, alpha_minus), and regularity measures (TV(phi'), C(phi)). This taxonomy establishes well-posedness, affine reparameterization laws with bias, and closure under bounded slope variation. Dynamical analysis yields Lyapunov theorems with explicit descent constants and identifies variance stability regions through (m2', g2). From a kernel perspective, we derive dimension-free Hessian bounds and connect smoothness to bounded variation of phi'. Applying the framework, we classify eight standard activations (ReLU, leaky-ReLU, tanh, sigmoid, Swish, GELU, Mish, TeLU), proving sharp distinctions between saturating, linear-growth, and smooth families. Numerical Gauss-Hermite and Monte Carlo validation confirms theoretical predictions. Our framework provides principled design guidance, moving activation choice from trial-and-error to provable stability and kernel conditioning.

Related papers

MSINO: Curvature-Aware Sobolev Optimization for Manifold Neural Networks [0.0]
We introduce MSINO, a curvature aware training framework for neural networks defined on Riemannian manifold.<n>We derive geometry dependent constants that yield Descent Lemma with a manifold Sobolev smoothness constant.<n>MSINO provides training time guarantees that explicitly track curvature and transported Jacobians.
arXiv Detail & Related papers (2026-02-26T12:27:00Z)
Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z)
Beyond ReLU: Bifurcation, Oversmoothing, and Topological Priors [28.452964044443906]
Graph Neural Networks (GNNs) learn node representations through iterative network-based message-passing.<n>Deep GNNs suffer from oversmoothing, where node features converge to a homogeneous, non-informative state.<n>We re-frame this problem of representational collapse from a emphbifurcation theory perspective, characterizing oversmoothing as convergence to a stable homogeneous fixed point.
arXiv Detail & Related papers (2026-02-17T15:03:28Z)
Analytic and Variational Stability of Deep Learning Systems [0.0]
We show that uniform boundedness of stability signatures is equivalent to the existence of a Lyapunov-type energy that dissipates along the learning flow.<n>In smooth regimes, the framework yields explicit stability exponents linking spectral norms, activation regularity, step sizes, and learning rates to contractivity of the learning dynamics.<n>The theory extends to non-smooth learning systems, including ReLU networks, proximal and projected updates, and subgradient flows.
arXiv Detail & Related papers (2025-12-24T14:43:59Z)
From Tail Universality to Bernstein-von Mises: A Unified Statistical Theory of Semi-Implicit Variational Inference [0.12183405753834557]
Semi-implicit variational inference (SIVI) constructs approximate posteriors of the form $q() = int k(| z) r(dz)$<n>This paper develops a unified "approximation-optimization-statistics'' theory for such families.
arXiv Detail & Related papers (2025-12-05T19:26:25Z)
A Mean-Field Theory of $Θ$-Expectations [2.1756081703276]
We develop a new class of calculus for such non-linear models.<n>Theta-Expectation is shown to be consistent with the axiom of subaddivity.
arXiv Detail & Related papers (2025-07-30T11:08:56Z)
A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning. This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z)
Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult [49.8719617899285]
Large learning rates, when applied to objective descent for non optimization, yield various implicit biases including the edge of stability. This paper provides an initial step in descent and shows that these implicit biases are in fact various tips same iceberg.
arXiv Detail & Related papers (2023-10-26T01:11:17Z)
NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer [45.47667026025716]
We propose a novel, robust and accelerated iteration that relies on two key elements. The convergence and stability of the obtained method, referred to as NAG-GS, are first studied extensively. We show that NAG-arity is competitive with state-the-art methods such as momentum SGD with weight decay and AdamW for the training of machine learning models.
arXiv Detail & Related papers (2022-09-29T16:54:53Z)
Beyond the Edge of Stability via Two-step Gradient Updates [49.03389279816152]
Gradient Descent (GD) is a powerful workhorse of modern machine learning. GD's ability to find local minimisers is only guaranteed for losses with Lipschitz gradients. This work focuses on simple, yet representative, learning problems via analysis of two-step gradient updates.
arXiv Detail & Related papers (2022-06-08T21:32:50Z)
High-dimensional limit theorems for SGD: Effective dynamics and critical scaling [6.950316788263433]
We prove limit theorems for the trajectories of summary statistics of gradient descent (SGD) We show a critical scaling regime for the step-size, below which the effective ballistic dynamics matches gradient flow for the population loss. About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate.
arXiv Detail & Related papers (2022-06-08T17:42:18Z)
Efficient simulation of Gottesman-Kitaev-Preskill states with Gaussian circuits [68.8204255655161]
We study the classical simulatability of Gottesman-Kitaev-Preskill (GKP) states in combination with arbitrary displacements, a large set of symplectic operations and homodyne measurements. For these types of circuits, neither continuous-variable theorems based on the non-negativity of quasi-probability distributions nor discrete-variable theorems can be employed to assess the simulatability.
arXiv Detail & Related papers (2022-03-21T17:57:02Z)
Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates. This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting. To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.