Optimization-Induced Dynamics of Lipschitz Continuity in Neural Networks
- URL: http://arxiv.org/abs/2506.18588v1
- Date: Mon, 23 Jun 2025 12:49:13 GMT
- Title: Optimization-Induced Dynamics of Lipschitz Continuity in Neural Networks
- Authors: Róisín Luo, James McDermott, Christian Gagné, Qiang Sun, Colm O'Riordan,
- Abstract summary: Lipschitz continuity characterizes the worst-case sensitivity of neural networks to small input perturbations.<n>We present a rigorous framework to model the temporal evolution of Lipschitz continuity during training with gradient descent.
- Score: 7.486235601021366
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Lipschitz continuity characterizes the worst-case sensitivity of neural networks to small input perturbations; yet its dynamics (i.e. temporal evolution) during training remains under-explored. We present a rigorous mathematical framework to model the temporal evolution of Lipschitz continuity during training with stochastic gradient descent (SGD). This framework leverages a system of stochastic differential equations (SDEs) to capture both deterministic and stochastic forces. Our theoretical analysis identifies three principal factors driving the evolution: (i) the projection of gradient flows, induced by the optimization dynamics, onto the operator-norm Jacobian of parameter matrices; (ii) the projection of gradient noise, arising from the randomness in mini-batch sampling, onto the operator-norm Jacobian; and (iii) the projection of the gradient noise onto the operator-norm Hessian of parameter matrices. Furthermore, our theoretical framework sheds light on such as how noisy supervision, parameter initialization, batch size, and mini-batch sampling trajectories, among other factors, shape the evolution of the Lipschitz continuity of neural networks. Our experimental results demonstrate strong agreement between the theoretical implications and the observed behaviors.
Related papers
- Quantum-Inspired Tensor Networks for Approximating PDE Flow Maps [1.7887197093662073]
We investigate quantum-inspired tensor networks (QTNs) for approximating flow maps of hydrodynamic partial differential equations (PDEs)<n>Motivated by the effective low-rank structure that emerges after tensorization, we encode PDE states as matrix product states (MPS)<n>Experiments on one- and two-dimensional linear advection-diffusion and nonlinear viscous Burgers equations demonstrate accurate short-horizon prediction, favorable scaling in smooth diffusive regimes, and error growth in nonlinear multi-step predictions.
arXiv Detail & Related papers (2026-02-16T10:06:19Z) - ODELoRA: Training Low-Rank Adaptation by Solving Ordinary Differential Equations [54.886931928255564]
Low-rank adaptation (LoRA) has emerged as a widely adopted parameter-efficient fine-tuning method in deep transfer learning.<n>We propose a novel continuous-time optimization dynamic for LoRA factor matrices in the form of an ordinary differential equation (ODE)<n>We show that ODELoRA achieves stable feature learning, a property that is crucial for training deep neural networks at different scales of problem dimensionality.
arXiv Detail & Related papers (2026-02-07T10:19:36Z) - Kantorovich-Type Stochastic Neural Network Operators for the Mean-Square Approximation of Certain Second-Order Stochastic Processes [0.0]
We construct a new class of textbfKantorovich-type Neural Network Operators (K-SNNOs) in which randomness is incorporated not at the coefficient level, but through textbfstochastic neurons driven by neurons.<n>This framework enables the operator to inherit the probabilistic structure of the underlying process, making it suitable for modeling and approximating signals.
arXiv Detail & Related papers (2026-01-07T06:25:40Z) - Convergence of Stochastic Gradient Langevin Dynamics in the Lazy Training Regime [4.297070083645049]
Continuoustime models provide insights into the training dynamics of optimization algorithms in deep learning.<n>We establish a non-asymptotic convergence analysis of gradient Langevin dynamics (SGLD)<n>We show that, under regularity conditions on the Hessian of the loss function, SGLD with multiplicative and state-dependent noise yields a non-degenerate kernel throughout the training process with high probability.
arXiv Detail & Related papers (2025-10-24T08:28:53Z) - Stochastic Fractional Neural Operators: A Symmetrized Approach to Modeling Turbulence in Complex Fluid Dynamics [0.0]
We introduce a new class of neural network operators designed to handle problems where memory effects and randomness play a central role.<n>Our approach provides theoretical guarantees for the approximation quality and suggests that these neural operators can serve as effective tools in the analysis and simulation of complex systems.
arXiv Detail & Related papers (2025-05-12T13:11:08Z) - A Riemannian Optimization Perspective of the Gauss-Newton Method for Feedforward Neural Networks [3.48097307252416]
We analyze the convergence of Gauss-Newton dynamics for training neural networks with smooth activation functions.<n>We show that the Levenberg-Marquardt dynamics with an appropriately chosen damping schedule yields fast convergence rate despite potentially ill-conditioned neural tangent kernel matrices.
arXiv Detail & Related papers (2024-12-18T16:51:47Z) - Machine learning in and out of equilibrium [58.88325379746631]
Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels.
We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium.
We propose a new variation of Langevin dynamics (SGLD) that harnesses without replacement minibatching.
arXiv Detail & Related papers (2023-06-06T09:12:49Z) - Learning Discretized Neural Networks under Ricci Flow [48.47315844022283]
We study Discretized Neural Networks (DNNs) composed of low-precision weights and activations.<n>DNNs suffer from either infinite or zero gradients due to the non-differentiable discrete function during training.
arXiv Detail & Related papers (2023-02-07T10:51:53Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Efficient hierarchical Bayesian inference for spatio-temporal regression
models in neuroimaging [6.512092052306553]
Examples include M/EEG inverse problems, encoding neural models for task-based fMRI analyses, and temperature monitoring schemes.
We devise a novel hierarchical flexible Bayesian framework within which the intrinsic-temporal dynamics of model parameters and noise are modeled.
arXiv Detail & Related papers (2021-11-02T15:50:01Z) - The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations,
and Anomalous Diffusion [29.489737359897312]
We study the limiting dynamics of deep neural networks trained with gradient descent (SGD)
We show that the key ingredient driving these dynamics is not the original training loss, but rather the combination of a modified loss, which implicitly regularizes the velocity and probability currents, which cause oscillations in phase space.
arXiv Detail & Related papers (2021-07-19T20:18:57Z) - Neural Stochastic Contraction Metrics for Learning-based Control and
Estimation [13.751135823626493]
The NSCM framework allows autonomous agents to approximate optimal stable control and estimation policies in real-time.
It outperforms existing nonlinear control and estimation techniques including the state-dependent Riccati equation, iterative LQR, EKF, and the neural contraction.
arXiv Detail & Related papers (2020-11-06T03:04:42Z) - Lipschitz Recurrent Neural Networks [100.72827570987992]
We show that our Lipschitz recurrent unit is more robust with respect to input and parameter perturbations as compared to other continuous-time RNNs.
Our experiments demonstrate that the Lipschitz RNN can outperform existing recurrent units on a range of benchmark tasks.
arXiv Detail & Related papers (2020-06-22T08:44:52Z) - An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d)
This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.