Related papers: A Variant of Gradient Descent Algorithm Based on Gradient Averaging

A Variant of Gradient Descent Algorithm Based on Gradient Averaging

URL: http://arxiv.org/abs/2012.02387v2
Date: Thu, 10 Dec 2020 06:48:52 GMT
Title: A Variant of Gradient Descent Algorithm Based on Gradient Averaging
Authors: Saugata Purkayastha and Sukannya Purkayastha
Abstract summary: In regression tasks, it is observed that the behaviour of Grad-Avg is almost identical with Gradient Descent (SGD) We show that Grad-Avg converges faster than the other state-of-the-arts for the classification task on two benchmark datasets.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, we study an optimizer, Grad-Avg to optimize error functions. We establish the convergence of the sequence of iterates of Grad-Avg mathematically to a minimizer (under boundedness assumption). We apply Grad-Avg along with some of the popular optimizers on regression as well as classification tasks. In regression tasks, it is observed that the behaviour of Grad-Avg is almost identical with Stochastic Gradient Descent (SGD). We present a mathematical justification of this fact. In case of classification tasks, it is observed that the performance of Grad-Avg can be enhanced by suitably scaling the parameters. Experimental results demonstrate that Grad-Avg converges faster than the other state-of-the-art optimizers for the classification task on two benchmark datasets.

Related papers

Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization [0.6906005491572401]
We show that noise in batch descent gradient (SGD) has the effect of smoothing objective function. We analyze a new graduated optimization algorithm that varies the degree of smoothing by learning rate and batch size.
arXiv Detail & Related papers (2023-11-15T07:27:40Z)
Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective. We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices. Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z)
Convergence of Batch Stochastic Gradient Descent Methods with Approximate Gradients and/or Noisy Measurements: Theory and Computational Results [0.9900482274337404]
We study convex optimization using a very general formulation called BSGD (Block Gradient Descent) We establish conditions for BSGD to converge to the global minimum, based on approximation theory. We show that when approximate gradients are used, BSGD converges while momentum-based methods can diverge.
arXiv Detail & Related papers (2022-09-12T16:23:15Z)
Stochastic Second-Order Methods Provably Beat SGD For Gradient-Dominated Functions [42.57892322514602]
We show that SCRN improves the best-known sample complexity of gradient descent by a factor of $mathcalO(epsilon-1/2)$. We also show that the sample complexity of SCRN can be improved by a factor of $mathcalO(epsilon-1/2)$ using a variance reduction method with time-varying batch sizes.
arXiv Detail & Related papers (2022-05-25T15:33:00Z)
Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems. In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z)
Adapting Stepsizes by Momentumized Gradients Improves Optimization and Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing. textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing. textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
arXiv Detail & Related papers (2021-06-22T03:13:23Z)
A Variance Controlled Stochastic Method with Biased Estimation for Faster Non-convex Optimization [0.0]
We propose a new technique, em variance controlled gradient (VCSG), to improve the performance of the reduced gradient (SVRG) $lambda$ is introduced in VCSG to avoid over-reducing a variance by SVRG. $mathcalO(min1/epsilon3/2,n1/4/epsilon)$ number of gradient evaluations, which improves the leading gradient complexity.
arXiv Detail & Related papers (2021-02-19T12:22:56Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Variance Reduction with Sparse Gradients [82.41780420431205]
Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients. We introduce a new sparsity operator: The random-top-k operator. Our algorithm consistently outperforms SpiderBoost on various tasks including image classification, natural language processing, and sparse matrix factorization.
arXiv Detail & Related papers (2020-01-27T08:23:58Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods. Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.