Related papers: Dynamic Decoupling of Placid Terminal Attractor-based Gradient Descent Algorithm

Dynamic Decoupling of Placid Terminal Attractor-based Gradient Descent Algorithm

URL: http://arxiv.org/abs/2409.06542v1
Date: Tue, 10 Sep 2024 14:15:56 GMT
Title: Dynamic Decoupling of Placid Terminal Attractor-based Gradient Descent Algorithm
Authors: Jinwei Zhao, Marco Gori, Alessandro Betti, Stefano Melacci, Hongtao Zhang, Jiedong Liu, Xinhong Hei,
Abstract summary: Gradient descent (GD) and gradient descent (SGD) have been widely used in a number of application domains. This paper carefully analyzes the dynamics of GD based on the terminal attractor at different stages of its gradient flow.
Score: 56.06235614890066
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gradient descent (GD) and stochastic gradient descent (SGD) have been widely used in a large number of application domains. Therefore, understanding the dynamics of GD and improving its convergence speed is still of great importance. This paper carefully analyzes the dynamics of GD based on the terminal attractor at different stages of its gradient flow. On the basis of the terminal sliding mode theory and the terminal attractor theory, four adaptive learning rates are designed. Their performances are investigated in light of a detailed theoretical investigation, and the running times of the learning procedures are evaluated and compared. The total times of their learning processes are also studied in detail. To evaluate their effectiveness, various simulation results are investigated on a function approximation problem and an image classification problem.

Related papers

AutoSGD: Automatic Learning Rate Selection for Stochastic Gradient Descent [58.05410015124021]
We introduce AutoSGD: an SGD method that automatically determines whether to increase or decrease the learning rate at a given iteration.<n> Empirical results suggest strong performance of the method on a variety of traditional optimization problems and machine learning tasks.
arXiv Detail & Related papers (2025-05-27T18:25:21Z)
Parallel Momentum Methods Under Biased Gradient Estimations [11.074080383657453]
Parallel gradient methods are gaining prominence in solving large-scale machine learning problems that involve data distributed across multiple nodes. However, obtaining unbiased bounds, which have been the focus of most theoretical research, is challenging in many machine learning applications. In this paper we work out the implications for special gradient where estimates are biased, i.e. in meta-learning and when gradients are compressed or clipped.
arXiv Detail & Related papers (2024-02-29T18:03:03Z)
Non-asymptotic Analysis of Biased Adaptive Stochastic Approximation [3.328448170090945]
Gradient Descent (SGD) with adaptive steps is widely used to train deep neural networks and generative models. This paper provides a comprehensive analysis of the effect of bias on gradient functions.
arXiv Detail & Related papers (2024-02-05T10:17:36Z)
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics. Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes. We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z)
On discretisation drift and smoothness regularisation in neural network training [0.0]
We aim to make steps towards an improved understanding of deep learning with a focus on optimisation and model regularisation. We start by investigating gradient descent (GD), a discrete-time algorithm at the basis of most popular deep learning optimisation algorithms. We derive novel continuous-time flows that account for discretisation drift. Unlike the NGF, these new flows can be used to describe learning rate specific behaviours of GD, such as training instabilities observed in supervised learning and two-player games. We then translate insights from continuous time into mitigation strategies for unstable GD dynamics, by constructing novel learning rate schedules and regulariser
arXiv Detail & Related papers (2023-10-21T15:21:36Z)
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems. PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features. In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z)
Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing [74.2952487120137]
It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in machine learning models. This paper provides a fine-grained analysis of the dynamics of GD for the matrix sensing problem.
arXiv Detail & Related papers (2023-01-27T02:30:51Z)
Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity [40.73281056650241]
We introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true gradient temporal difference learning algorithms. We show how gradient TD reinforcement learning methods can be formally derived, not by starting from their original objective functions, as previously attempted, but rather from a primal-dual saddle-point objective function.
arXiv Detail & Related papers (2020-06-06T21:04:21Z)
Disentangling Adaptive Gradient Methods from Learning Rates [65.0397050979662]
We take a deeper look at how adaptive gradient methods interact with the learning rate schedule. We introduce a "grafting" experiment which decouples an update's magnitude from its direction. We present some empirical and theoretical retrospectives on the generalization of adaptive gradient methods.
arXiv Detail & Related papers (2020-02-26T21:42:49Z)
The Break-Even Point on Optimization Trajectories of Deep Neural Networks [64.7563588124004]
We argue for the existence of the "break-even" point on this trajectory. We show that using a large learning rate in the initial phase of training reduces the variance of the gradient. We also show that using a low learning rate results in bad conditioning of the loss surface even for a neural network with batch normalization layers.
arXiv Detail & Related papers (2020-02-21T22:55:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.