A Variant of Gradient Descent Algorithm Based on Gradient Averaging
- URL: http://arxiv.org/abs/2012.02387v2
- Date: Thu, 10 Dec 2020 06:48:52 GMT
- Title: A Variant of Gradient Descent Algorithm Based on Gradient Averaging
- Authors: Saugata Purkayastha and Sukannya Purkayastha
- Abstract summary: In regression tasks, it is observed that the behaviour of Grad-Avg is almost identical with Gradient Descent (SGD)
We show that Grad-Avg converges faster than the other state-of-the-arts for the classification task on two benchmark datasets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we study an optimizer, Grad-Avg to optimize error functions. We
establish the convergence of the sequence of iterates of Grad-Avg
mathematically to a minimizer (under boundedness assumption). We apply Grad-Avg
along with some of the popular optimizers on regression as well as
classification tasks. In regression tasks, it is observed that the behaviour of
Grad-Avg is almost identical with Stochastic Gradient Descent (SGD). We present
a mathematical justification of this fact. In case of classification tasks, it
is observed that the performance of Grad-Avg can be enhanced by suitably
scaling the parameters. Experimental results demonstrate that Grad-Avg
converges faster than the other state-of-the-art optimizers for the
classification task on two benchmark datasets.
Related papers
- Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization [0.6906005491572401]
We show that noise in batch descent (SGD) has the effect of smoothing objective function.
We show that there is an interesting relationship between the degree of smoothing by SGDs noise, and the well-studied sharpness' indicator.
arXiv Detail & Related papers (2023-11-15T07:27:40Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Convergence of Batch Stochastic Gradient Descent Methods with
Approximate Gradients and/or Noisy Measurements: Theory and Computational
Results [0.9900482274337404]
We study convex optimization using a very general formulation called BSGD (Block Gradient Descent)
We establish conditions for BSGD to converge to the global minimum, based on approximation theory.
We show that when approximate gradients are used, BSGD converges while momentum-based methods can diverge.
arXiv Detail & Related papers (2022-09-12T16:23:15Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - A Variance Controlled Stochastic Method with Biased Estimation for
Faster Non-convex Optimization [0.0]
We propose a new technique, em variance controlled gradient (VCSG), to improve the performance of the reduced gradient (SVRG)
$lambda$ is introduced in VCSG to avoid over-reducing a variance by SVRG.
$mathcalO(min1/epsilon3/2,n1/4/epsilon)$ number of gradient evaluations, which improves the leading gradient complexity.
arXiv Detail & Related papers (2021-02-19T12:22:56Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Variance Reduction with Sparse Gradients [82.41780420431205]
Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients.
We introduce a new sparsity operator: The random-top-k operator.
Our algorithm consistently outperforms SpiderBoost on various tasks including image classification, natural language processing, and sparse matrix factorization.
arXiv Detail & Related papers (2020-01-27T08:23:58Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z) - On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods.
Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.