Global Convergence in Neural ODEs: Impact of Activation Functions
- URL: http://arxiv.org/abs/2509.22436v1
- Date: Fri, 26 Sep 2025 14:54:48 GMT
- Title: Global Convergence in Neural ODEs: Impact of Activation Functions
- Authors: Tianxiang Gao, Siyuan Sun, Hailiang Liu, Hongyang Gao,
- Abstract summary: We show that the properties of activation functions, specifically smoothness and nonlinearity, are critical to the training dynamics.<n>Our theoretical findings are validated by numerical experiments, which support our analysis and also provide practical guidelines for scaling Neural ODEs.
- Score: 19.19928901546021
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural Ordinary Differential Equations (ODEs) have been successful in various applications due to their continuous nature and parameter-sharing efficiency. However, these unique characteristics also introduce challenges in training, particularly with respect to gradient computation accuracy and convergence analysis. In this paper, we address these challenges by investigating the impact of activation functions. We demonstrate that the properties of activation functions, specifically smoothness and nonlinearity, are critical to the training dynamics. Smooth activation functions guarantee globally unique solutions for both forward and backward ODEs, while sufficient nonlinearity is essential for maintaining the spectral properties of the Neural Tangent Kernel (NTK) during training. Together, these properties enable us to establish the global convergence of Neural ODEs under gradient descent in overparameterized regimes. Our theoretical findings are validated by numerical experiments, which not only support our analysis but also provide practical guidelines for scaling Neural ODEs, potentially leading to faster training and improved performance in real-world applications.
Related papers
- A Neural Network for the Identical Kuramoto Equation: Architectural Considerations and Performance Evaluation [0.0]
We investigate the efficiency of Deep Neural Networks (DNNs) to approximate the solution of a nonlocal conservation law derived from the identical-oscillator Kuramoto model.<n>Through systematic experimentation, we demonstrate that network configuration parameters influence convergence characteristics.<n>We identify fundamental limitations of standard feed-forward architectures when handling singular or piecewise-constant solutions.
arXiv Detail & Related papers (2025-09-17T19:37:01Z) - Generative System Dynamics in Recurrent Neural Networks [56.958984970518564]
We investigate the continuous time dynamics of Recurrent Neural Networks (RNNs)<n>We show that skew-symmetric weight matrices are fundamental to enable stable limit cycles in both linear and nonlinear configurations.<n> Numerical simulations showcase how nonlinear activation functions not only maintain limit cycles, but also enhance the numerical stability of the system integration process.
arXiv Detail & Related papers (2025-04-16T10:39:43Z) - Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework.<n>We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values.<n>This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z) - Learnable Activation Functions in Physics-Informed Neural Networks for Solving Partial Differential Equations [0.0]
Physics-Informed Neural Networks (PINNs) have emerged as a promising approach for solving Partial Differential Equations (PDEs)<n>These limitations impact their accuracy for problems involving rapid oscillations, sharp gradients, and complex boundary behaviors.<n>We investigate learnable activation functions as a solution to these challenges.
arXiv Detail & Related papers (2024-11-22T18:25:13Z) - Analysing Rescaling, Discretization, and Linearization in RNNs for Neural System Modelling [0.0]
Recurrent Neural Networks (RNNs) are widely used for modelling neural activity, yet the mathematical interplay of core procedures is uncharacterized.<n>This study establishes the conditions under which these procedures commute, enabling flexible application in computational neuroscience.<n>Our findings directly guide the design of biologically plausible RNNs for simulating neural dynamics in decision-making and motor control.
arXiv Detail & Related papers (2023-12-26T10:00:33Z) - ENN: A Neural Network with DCT Adaptive Activation Functions [2.2713084727838115]
We present Expressive Neural Network (ENN), a novel model in which the non-linear activation functions are modeled using the Discrete Cosine Transform (DCT)
This parametrization keeps the number of trainable parameters low, is appropriate for gradient-based schemes, and adapts to different learning tasks.
The performance of ENN outperforms state of the art benchmarks, providing above a 40% gap in accuracy in some scenarios.
arXiv Detail & Related papers (2023-07-02T21:46:30Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.