Modified Step Size for Enhanced Stochastic Gradient Descent: Convergence
and Experiments
- URL: http://arxiv.org/abs/2309.01248v1
- Date: Sun, 3 Sep 2023 19:21:59 GMT
- Title: Modified Step Size for Enhanced Stochastic Gradient Descent: Convergence
and Experiments
- Authors: M. Soheil Shamaee, S. Fathi Hafshejani
- Abstract summary: This paper introduces a novel approach to the performance of the gradient descent (SGD) algorithm by incorporating a modified decay step size based on $frac1sqrttt.
The proposed step size integrates a logarithmic step term, leading to the selection of smaller values in the final iteration.
To the effectiveness of our approach, we conducted numerical experiments on image classification tasks using the FashionMNIST, andARAR datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces a novel approach to enhance the performance of the
stochastic gradient descent (SGD) algorithm by incorporating a modified decay
step size based on $\frac{1}{\sqrt{t}}$. The proposed step size integrates a
logarithmic term, leading to the selection of smaller values in the final
iterations. Our analysis establishes a convergence rate of $O(\frac{\ln
T}{\sqrt{T}})$ for smooth non-convex functions without the
Polyak-{\L}ojasiewicz condition. To evaluate the effectiveness of our approach,
we conducted numerical experiments on image classification tasks using the
FashionMNIST, and CIFAR10 datasets, and the results demonstrate significant
improvements in accuracy, with enhancements of $0.5\%$ and $1.4\%$ observed,
respectively, compared to the traditional $\frac{1}{\sqrt{t}}$ step size. The
source code can be found at \\\url{https://github.com/Shamaeem/LNSQRTStepSize}.
Related papers
- Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods.
In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z) - Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates [15.27596975662702]
We explore algorithms achieving optimal rates of DP optimization with heavy-tailed gradients.
Our results match the minimax lower bound in citekamath2022, indicating that the theoretical limit of convex optimization under DP is achievable.
arXiv Detail & Related papers (2024-08-19T11:07:05Z) - Gradient Compressed Sensing: A Query-Efficient Gradient Estimator for High-Dimensional Zeroth-Order Optimization [48.84672493756553]
We propose a query-efficient and accurate estimator for gradients that uses only $Obig(slogfrac dsbig)$ queries per step.<n>Our proposed GraCe generalizes the Indyk--Price--Woodruff (IPW) algorithm in compressed sensing from linear measurements to nonlinear functions.
arXiv Detail & Related papers (2024-05-27T03:52:53Z) - Second-order Information Promotes Mini-Batch Robustness in Variance-Reduced Gradients [0.196629787330046]
We show that incorporating partial second-order information of the objective function can dramatically improve robustness to mini-batch size of variance-reduced gradient methods.
We demonstrate this phenomenon on a prototypical Newton ($textttMb-SVRN$) algorithm.
arXiv Detail & Related papers (2024-04-23T05:45:52Z) - New logarithmic step size for stochastic gradient descent [0.0]
We propose a novel warm restart technique using a new logarithmic step size for the gradient descent (SGD)
Our results show that the new logarithmic step improves test accuracy by 92% for the CIFAR100 dataset when we utilize a neuralal network (CNN) model.
arXiv Detail & Related papers (2024-04-01T17:25:27Z) - On Convergence of Incremental Gradient for Non-Convex Smooth Functions [63.51187646914962]
In machine learning and network optimization, algorithms like shuffle SGD are popular due to minimizing the number of misses and good cache.
This paper delves into the convergence properties SGD algorithms with arbitrary data ordering.
arXiv Detail & Related papers (2023-05-30T17:47:27Z) - Sharper Rates and Flexible Framework for Nonconvex SGD with Client and
Data Sampling [64.31011847952006]
We revisit the problem of finding an approximately stationary point of the average of $n$ smooth and possibly non-color functions.
We generalize the $smallsfcolorgreen$ so that it can provably work with virtually any sampling mechanism.
We provide the most general and most accurate analysis of optimal bound in the smooth non-color regime.
arXiv Detail & Related papers (2022-06-05T21:32:33Z) - Provably Efficient Convergence of Primal-Dual Actor-Critic with
Nonlinear Function Approximation [15.319335698574932]
We show the first efficient convergence result with primal-dual actor-critic with a convergence of $mathcalOleft ascent(Nright)Nright)$ under Polyian sampling.
Results on Open GymAI continuous control tasks.
arXiv Detail & Related papers (2022-02-28T15:16:23Z) - Towards Noise-adaptive, Problem-adaptive Stochastic Gradient Descent [7.176107039687231]
We design step-size schemes that make gradient descent (SGD) adaptive to (i) the noise.
We prove that $T$ iterations of SGD with Nesterov iterations can be near optimal.
Compared to other step-size schemes, we demonstrate the effectiveness of a novel novel exponential step-size scheme.
arXiv Detail & Related papers (2021-10-21T19:22:14Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - A Variance Controlled Stochastic Method with Biased Estimation for
Faster Non-convex Optimization [0.0]
We propose a new technique, em variance controlled gradient (VCSG), to improve the performance of the reduced gradient (SVRG)
$lambda$ is introduced in VCSG to avoid over-reducing a variance by SVRG.
$mathcalO(min1/epsilon3/2,n1/4/epsilon)$ number of gradient evaluations, which improves the leading gradient complexity.
arXiv Detail & Related papers (2021-02-19T12:22:56Z) - A New Framework for Variance-Reduced Hamiltonian Monte Carlo [88.84622104944503]
We propose a new framework of variance-reduced Hamiltonian Monte Carlo (HMC) methods for sampling from an $L$-smooth and $m$-strongly log-concave distribution.
We show that the unbiased gradient estimators, including SAGA and SVRG, based HMC methods achieve highest gradient efficiency with small batch size.
Experimental results on both synthetic and real-world benchmark data show that our new framework significantly outperforms the full gradient and gradient HMC approaches.
arXiv Detail & Related papers (2021-02-09T02:44:24Z) - On the Almost Sure Convergence of Stochastic Gradient Descent in
Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD)
We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z) - Carath\'eodory Sampling for Stochastic Gradient Descent [79.55586575988292]
We present an approach that is inspired by classical results of Tchakaloff and Carath'eodory about measure reduction.
We adaptively select the descent steps where the measure reduction is carried out.
We combine this with Block Coordinate Descent so that measure reduction can be done very cheaply.
arXiv Detail & Related papers (2020-06-02T17:52:59Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.