Stochastic Gradient Variance Reduction by Solving a Filtering Problem
- URL: http://arxiv.org/abs/2012.12418v1
- Date: Tue, 22 Dec 2020 23:48:42 GMT
- Title: Stochastic Gradient Variance Reduction by Solving a Filtering Problem
- Authors: Xingyi Yang
- Abstract summary: Deep neural networks (DNN) are typically using optimized gradient descent (SGD)
The estimation of the gradient using samples tends to be noisy and unreliable, resulting in large gradient variance and bad convergence.
We propose textbfFilter Gradient Decent(FGD), an efficient optimization algorithm that makes the consistent estimation of gradient.
- Score: 0.951828574518325
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks (DNN) are typically optimized using stochastic gradient
descent (SGD). However, the estimation of the gradient using stochastic samples
tends to be noisy and unreliable, resulting in large gradient variance and bad
convergence. In this paper, we propose \textbf{Filter Gradient Decent}~(FGD),
an efficient stochastic optimization algorithm that makes the consistent
estimation of the local gradient by solving an adaptive filtering problem with
different design of filters. Our method reduces variance in stochastic gradient
descent by incorporating the historical states to enhance the current
estimation. It is able to correct noisy gradient direction as well as to
accelerate the convergence of learning. We demonstrate the effectiveness of the
proposed Filter Gradient Descent on numerical optimization and training neural
networks, where it achieves superior and robust performance compared with
traditional momentum-based methods. To the best of our knowledge, we are the
first to provide a practical solution that integrates filtering into gradient
estimation by making the analogy between gradient estimation and filtering
problems in signal processing. (The code is provided in
https://github.com/Adamdad/Filter-Gradient-Decent)
Related papers
- Robust Stochastic Optimization via Gradient Quantile Clipping [6.2844649973308835]
We introduce a quant clipping strategy for Gradient Descent (SGD)
We use gradient new outliers as norm clipping chains.
We propose an implementation of the algorithm using Huberiles.
arXiv Detail & Related papers (2023-09-29T15:24:48Z) - Neural Gradient Learning and Optimization for Oriented Point Normal
Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation.
We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors.
Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z) - Convergence of First-Order Methods for Constrained Nonconvex
Optimization with Dependent Data [7.513100214864646]
We show the worst-case complexity of convergence $tildeO(t-1/4)$ and MoreautildeO(vareps-4)$ for smooth non- optimization.
We obtain first online nonnegative matrix factorization algorithms for dependent data based on projected gradient methods with adaptive step sizes and optimal convergence.
arXiv Detail & Related papers (2022-03-29T17:59:10Z) - Privacy-Preserving Logistic Regression Training with A Faster Gradient Variant [0.0]
We introduce an efficient gradient, called $quadratic$ $gradient$, for privacy-preserving logistic regression training.
Experimental results demonstrate that the enhanced algorithms achieve significantly improved convergence speed.
There is a good chance that the quadratic gradient approach could integrate first-order gradient descent/ascent algorithms with the second-order Newton-Raphson methods.
arXiv Detail & Related papers (2022-01-26T09:44:13Z) - Random-reshuffled SARAH does not need a full gradient computations [61.85897464405715]
The StochAstic Recursive grAdientritHm (SARAH) algorithm is a variance reduced variant of the Gradient Descent (SGD) algorithm.
In this paper, we remove the necessity of a full gradient.
The aggregated gradients serve as an estimate of a full gradient in the SARAH algorithm.
arXiv Detail & Related papers (2021-11-26T06:00:44Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Channel-Directed Gradients for Optimization of Convolutional Neural
Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error.
We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z) - Multi-kernel Passive Stochastic Gradient Algorithms and Transfer
Learning [21.796874356469644]
The gradient algorithm does not have control over the location where noisy gradients of the cost function are evaluated.
The algorithm performs substantially better in high dimensional problems and incorporates variance reduction.
arXiv Detail & Related papers (2020-08-23T11:55:19Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.