Related papers: Randomness and Interpolation Improve Gradient Descent

Randomness and Interpolation Improve Gradient Descent

URL: http://arxiv.org/abs/2510.13040v1
Date: Tue, 14 Oct 2025 23:32:01 GMT
Title: Randomness and Interpolation Improve Gradient Descent
Authors: Jiawen Li, Pascal Lefevre, Anwar Pp Abdul Majeed,
Abstract summary: The paper introduces two Avolutions, named Interpolational Gradient Descent (IAGD) and Noise-Regularized Gradient Descent (NRSGD)<n>IAGD uses Newton Interpolation to expedite the convergence process during training, assuming relevancy in gradients between iterations.<n>NRSGD incorporates a noise regularization technique that controlled noise to the gradients during the optimization process.
Score: 1.7551939488475077
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Based on Stochastic Gradient Descent (SGD), the paper introduces two optimizers, named Interpolational Accelerating Gradient Descent (IAGD) as well as Noise-Regularized Stochastic Gradient Descent (NRSGD). IAGD leverages second-order Newton Interpolation to expedite the convergence process during training, assuming relevancy in gradients between iterations. To avoid over-fitting, NRSGD incorporates a noise regularization technique that introduces controlled noise to the gradients during the optimization process. Comparative experiments of this research are conducted on the CIFAR-10, and CIFAR-100 datasets, benchmarking different CNNs(Convolutional Neural Networks) with IAGD and NRSGD against classical optimizers in Keras Package. Results demonstrate the potential of those two viable improvement methods in SGD, implicating the effectiveness of the advancements.

Related papers

Powering Up Zeroth-Order Training via Subspace Gradient Orthogonalization [40.95701844244596]
We show that ZO optimization can be substantially improved by unifying two complementary principles.<n>We instantiate in a new method, ZO-Muon, admitting a natural interpretation as a low-rank Muon in the ZO setting.
arXiv Detail & Related papers (2026-02-19T08:08:33Z)
Effective Dimension Aware Fractional-Order Stochastic Gradient Descent for Convex Optimization Problems [2.5971517743176915]
We propose 2SED Fractional-Order Gradient Descent (2SEDFOSGD) to adapt the fractional exponent in a data-driven manner.<n>Theoretically, this approach preserves the advantages of fractional memory without the sluggish or unstable behavior observed in na"ive fractional SGD.
arXiv Detail & Related papers (2025-03-17T22:57:37Z)
Gradient Normalization Provably Benefits Nonconvex SGD under Heavy-Tailed Noise [60.92029979853314]
We investigate the roles of gradient normalization and clipping in ensuring the convergence of Gradient Descent (SGD) under heavy-tailed noise. Our work provides the first theoretical evidence demonstrating the benefits of gradient normalization in SGD under heavy-tailed noise. We introduce an accelerated SGD variant incorporating gradient normalization and clipping, further enhancing convergence rates under heavy-tailed noise.
arXiv Detail & Related papers (2024-10-21T22:40:42Z)
Variational Stochastic Gradient Descent for Deep Neural Networks [16.96187187108041]
Variational Gradient Descent (VSGD) is an efficient and effective gradient-based image optimization method.<n>We show that VSGD outperforms Adam and SGD on two classification datasets and four deep neural network architectures.
arXiv Detail & Related papers (2024-04-09T18:02:01Z)
Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z)
Signal Processing Meets SGD: From Momentum to Filter [6.751292200515355]
In deep learning, gradient descent (SGD) and its momentum-based variants are widely used for optimization.<n>In this paper, we analyze gradient behavior through a signal processing lens, isolating key factors that influence updates.<n>We introduce a novel method SGDF based on Wiener Filter principles, which derives an optimal time-varying gain to refine updates.
arXiv Detail & Related papers (2023-11-06T01:41:46Z)
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems. PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features. In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z)
Score-Guided Intermediate Layer Optimization: Fast Langevin Mixing for Inverse Problem [97.64313409741614]
We prove fast mixing and characterize the stationary distribution of the Langevin Algorithm for inverting random weighted DNN generators. We propose to do posterior sampling in the latent space of a pre-trained generative model.
arXiv Detail & Related papers (2022-06-18T03:47:37Z)
ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency. This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation. In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z)
Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks. We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution. We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z)
Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients. We prove new complexity that outperform state-of-the-art results in this case. We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.