Gradient Correction beyond Gradient Descent
- URL: http://arxiv.org/abs/2203.08345v2
- Date: Fri, 26 May 2023 02:15:06 GMT
- Title: Gradient Correction beyond Gradient Descent
- Authors: Zefan Li, Bingbing Ni, Teng Li, WenJun Zhang, Wen Gao
- Abstract summary: gradient correction is apparently the most crucial aspect for the training of a neural network.
We introduce a framework (textbfGCGD) to perform gradient correction.
Experiment results show that our gradient correction framework can effectively improve the gradient quality to reduce training epochs by $sim$ 20% and also improve the network performance.
- Score: 63.33439072360198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The great success neural networks have achieved is inseparable from the
application of gradient-descent (GD) algorithms. Based on GD, many variant
algorithms have emerged to improve the GD optimization process. The gradient
for back-propagation is apparently the most crucial aspect for the training of
a neural network. The quality of the calculated gradient can be affected by
multiple aspects, e.g., noisy data, calculation error, algorithm limitation,
and so on. To reveal gradient information beyond gradient descent, we introduce
a framework (\textbf{GCGD}) to perform gradient correction. GCGD consists of
two plug-in modules: 1) inspired by the idea of gradient prediction, we propose
a \textbf{GC-W} module for weight gradient correction; 2) based on Neural ODE,
we propose a \textbf{GC-ODE} module for hidden states gradient correction.
Experiment results show that our gradient correction framework can effectively
improve the gradient quality to reduce training epochs by $\sim$ 20\% and also
improve the network performance.
Related papers
- Mitigating Gradient Overlap in Deep Residual Networks with Gradient Normalization for Improved Non-Convex Optimization [0.0]
In deep learning, Residual Networks (ResNets) have proven effective in addressing the vanishing problem.
skip connections in ResNets can lead to overlap, where from both the learned transformation and skip connection combine in gradients.
We examine Z-score Normalization (ZNorm) as a technique to manage overlap.
arXiv Detail & Related papers (2024-10-28T21:54:44Z) - Adaptive Gradient Regularization: A Faster and Generalizable Optimization Technique for Deep Neural Networks [5.507301894089302]
This paper is the first attempt to study a new optimization technique for deep neural networks, using the sum normalization of a gradient vector as coefficients.
The proposed technique is hence named as the adaptive gradient regularization (AGR)
arXiv Detail & Related papers (2024-07-24T02:23:18Z) - How to guess a gradient [68.98681202222664]
We show that gradients are more structured than previously thought.
Exploiting this structure can significantly improve gradient-free optimization schemes.
We highlight new challenges in overcoming the large gap between optimizing with exact gradients and guessing the gradients.
arXiv Detail & Related papers (2023-12-07T21:40:44Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - AdaDGS: An adaptive black-box optimization method with a nonlocal
directional Gaussian smoothing gradient [3.1546318469750196]
A directional Gaussian smoothing (DGS) approach was recently proposed in (Zhang et al., 2020) and used to define a truly nonlocal gradient, referred to as the DGS gradient, for high-dimensional black-box optimization.
We present a simple, yet ingenious and efficient adaptive approach for optimization with the DGS gradient, which removes the need of hyper- parameter fine tuning.
arXiv Detail & Related papers (2020-11-03T21:20:25Z) - Channel-Directed Gradients for Optimization of Convolutional Neural
Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error.
We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - The duality structure gradient descent algorithm: analysis and applications to neural networks [0.0]
We propose an algorithm named duality structure gradient descent (DSGD) that is amenable to non-asymptotic performance analysis.
We empirically demonstrate the behavior of DSGD in several neural network training scenarios.
arXiv Detail & Related papers (2017-08-01T21:24:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.