Gradient Descent as a Perceptron Algorithm: Understanding Dynamics and Implicit Acceleration
- URL: http://arxiv.org/abs/2512.11587v1
- Date: Fri, 12 Dec 2025 14:16:35 GMT
- Title: Gradient Descent as a Perceptron Algorithm: Understanding Dynamics and Implicit Acceleration
- Authors: Alexander Tyurin,
- Abstract summary: We show that the steps of gradient descent (GD) reduce to those of generalized perceptron algorithms.<n>This helps explain the optimization dynamics and the implicit acceleration phenomenon observed in neural networks.
- Score: 67.12978375116599
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Even for the gradient descent (GD) method applied to neural network training, understanding its optimization dynamics, including convergence rate, iterate trajectories, function value oscillations, and especially its implicit acceleration, remains a challenging problem. We analyze nonlinear models with the logistic loss and show that the steps of GD reduce to those of generalized perceptron algorithms (Rosenblatt, 1958), providing a new perspective on the dynamics. This reduction yields significantly simpler algorithmic steps, which we analyze using classical linear algebra tools. Using these tools, we demonstrate on a minimalistic example that the nonlinearity in a two-layer model can provably yield a faster iteration complexity $\tilde{O}(\sqrt{d})$ compared to $Ω(d)$ achieved by linear models, where $d$ is the number of features. This helps explain the optimization dynamics and the implicit acceleration phenomenon observed in neural networks. The theoretical results are supported by extensive numerical experiments. We believe that this alternative view will further advance research on the optimization of neural networks.
Related papers
- Convergence of Stochastic Gradient Methods for Wide Two-Layer Physics-Informed Neural Networks [0.6319731355340598]
In practice, one often employs gradient descent type algorithms to train the neural network.<n>In this work, we establish the linear convergence of gradient descent / flow in training over-ized two layer PINNs for a general class of activation functions in the sense of high probability.
arXiv Detail & Related papers (2025-08-29T12:25:51Z) - Learning Optical Flow Field via Neural Ordinary Differential Equation [44.16275288019991]
Recent works on optical flow estimation use neural networks to predict the flow field that maps positions of one image to positions of the other.<n>We introduce a novel approach for predicting the derivative of the flow using a continuous model, namely neural ordinary differential equations (ODE)
arXiv Detail & Related papers (2025-06-03T18:30:14Z) - Dynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks [10.591718074748895]
We study the learning dynamics of large two-layer neural networks via dynamical mean field theory.<n>For large network width $m$, and large number of samples per input dimension $n/d$, the training dynamics exhibits a separation of timescales.
arXiv Detail & Related papers (2025-02-28T17:45:26Z) - Optimization Insights into Deep Diagonal Linear Networks [10.395029724463672]
We study the implicit regularization properties of the gradient flow "algorithm" for estimating the parameters of a deep diagonal neural network.<n>Our main contribution is showing that this gradient flow induces a mirror flow dynamic on the model, meaning that it is biased towards a specific solution of the problem.
arXiv Detail & Related papers (2024-12-21T20:23:47Z) - Training Hamiltonian neural networks without backpropagation [0.0]
We present a backpropagation-free algorithm to accelerate the training of neural networks for approximating Hamiltonian systems.
We show that our approach is more than 100 times faster with CPUs than the traditionally trained Hamiltonian Neural Networks.
arXiv Detail & Related papers (2024-11-26T15:22:30Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Neural Dynamic Mode Decomposition for End-to-End Modeling of Nonlinear
Dynamics [49.41640137945938]
We propose a neural dynamic mode decomposition for estimating a lift function based on neural networks.
With our proposed method, the forecast error is backpropagated through the neural networks and the spectral decomposition.
Our experiments demonstrate the effectiveness of our proposed method in terms of eigenvalue estimation and forecast performance.
arXiv Detail & Related papers (2020-12-11T08:34:26Z) - A Dynamical View on Optimization Algorithms of Overparameterized Neural
Networks [23.038631072178735]
We consider a broad class of optimization algorithms that are commonly used in practice.
As a consequence, we can leverage the convergence behavior of neural networks.
We believe our approach can also be extended to other optimization algorithms and network theory.
arXiv Detail & Related papers (2020-10-25T17:10:22Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.