Related papers: On the Rate of Convergence of GD in Non-linear Neural Networks: An Adversarial Robustness Perspective

On the Rate of Convergence of GD in Non-linear Neural Networks: An Adversarial Robustness Perspective

URL: http://arxiv.org/abs/2603.02095v1
Date: Mon, 02 Mar 2026 17:13:33 GMT
Title: On the Rate of Convergence of GD in Non-linear Neural Networks: An Adversarial Robustness Perspective
Authors: Guy Smorodinsky, Sveta Gimpleson, Itay Safran,
Abstract summary: We study the convergence dynamics of Gradient Descent (GD) in a minimal binary classification setting.<n>We prove that while GD successfully converges to an optimal robustness margin, this convergence occurs at a prohibitively slow rate.<n>Our theoretical guarantees are derived via a rigorous analysis of the GD trajectories across the distinct activation patterns of the model.
Score: 2.268525139011456
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the convergence dynamics of Gradient Descent (GD) in a minimal binary classification setting, consisting of a two-neuron ReLU network and two training instances. We prove that even under these strong simplifying assumptions, while GD successfully converges to an optimal robustness margin, effectively maximizing the distance between the decision boundary and the training points, this convergence occurs at a prohibitively slow rate, scaling strictly as $Θ(1/\ln(t))$. To the best of our knowledge, this establishes the first explicit lower bound on the convergence rate of the robustness margin in a non-linear model. Through empirical simulations, we further demonstrate that this inherent failure mode is pervasive, exhibiting the exact same tight convergence rate across multiple natural network initializations. Our theoretical guarantees are derived via a rigorous analysis of the GD trajectories across the distinct activation patterns of the model. Specifically, we develop tight control over the system's dynamics to bound the trajectory of the decision boundary, overcoming the primary technical challenge introduced by the non-linear nature of the architecture.

Related papers

Towards a Unified Analysis of Neural Networks in Nonparametric Instrumental Variable Regression: Optimization and Generalization [66.08522228989634]
We establish the first global convergence result of neural networks for two stage least squares (2SLS) approach in nonparametric instrumental variable regression (NPIV)<n>This is achieved by adopting a lifted perspective through mean-field Langevin dynamics (MFLD)
arXiv Detail & Related papers (2025-11-18T17:51:17Z)
Certified Neural Approximations of Nonlinear Dynamics [51.01318247729693]
In safety-critical contexts, the use of neural approximations requires formal bounds on their closeness to the underlying system.<n>We propose a novel, adaptive, and parallelizable verification method based on certified first-order models.
arXiv Detail & Related papers (2025-05-21T13:22:20Z)
Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks [4.554284689395686]
implicit gradient descent (IGD) outperforms the common gradient descent (GD) algorithm in handling certain multi-scale problems.<n>We show that IGD converges to a globally optimal solution at a linear convergence rate.
arXiv Detail & Related papers (2024-07-03T06:10:41Z)
Compositional Curvature Bounds for Deep Neural Networks [7.373617024876726]
A key challenge that threatens the widespread use of neural networks in safety-critical applications is their vulnerability to adversarial attacks. We study the second-order behavior of continuously differentiable deep neural networks, focusing on robustness against adversarial perturbations. We introduce a novel algorithm to analytically compute provable upper bounds on the second derivative of neural networks.
arXiv Detail & Related papers (2024-06-07T17:50:15Z)
Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z)
Double Duality: Variational Primal-Dual Policy Optimization for Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure. Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Provable Accelerated Convergence of Nesterov's Momentum for Deep ReLU Neural Networks [12.763567932588591]
Current state-of-the-art analyses on the convergence of gradient descent for training neural networks focus on characterizing properties of the loss landscape. We consider a new class of objective functions, where only a subset of the parameters satisfies strong convexity, and show Nesterov's momentum acceleration in theory. We provide two realizations of the problem class, one of which is deep ReLU networks, which --to the best of our knowledge-constitutes this work the first that proves accelerated convergence rate for non-trivial neural network architectures.
arXiv Detail & Related papers (2023-06-13T19:55:46Z)
Mean-field analysis for heavy ball methods: Dropout-stability, connectivity, and global convergence [17.63517562327928]
This paper focuses on neural networks with two and three layers and provides a rigorous understanding of the properties of the solutions found by SHB. We show convergence to the global optimum and give a quantitative bound between the mean-field limit and the SHB dynamics of a finite-width network.
arXiv Detail & Related papers (2022-10-13T08:08:25Z)
Convex Analysis of the Mean Field Langevin Dynamics [49.66486092259375]
convergence rate analysis of the mean field Langevin dynamics is presented. $p_q$ associated with the dynamics allows us to develop a convergence theory parallel to classical results in convex optimization.
arXiv Detail & Related papers (2022-01-25T17:13:56Z)
Boundary Uncertainty in a Single-Stage Temporal Action Localization Network [12.364819165688628]
We show that with both uncertainty modeling approaches improve the detection performance by more than $1.5%$ in mAP@tIoU=0.5. The proposed simple one-stage network performs closely to more complex one and two stage networks.
arXiv Detail & Related papers (2020-08-25T17:04:39Z)
An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d) This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.