One-Step Forward and Backtrack: Overcoming Zig-Zagging in Loss-Aware
Quantization Training
- URL: http://arxiv.org/abs/2401.16760v1
- Date: Tue, 30 Jan 2024 05:42:54 GMT
- Title: One-Step Forward and Backtrack: Overcoming Zig-Zagging in Loss-Aware
Quantization Training
- Authors: Lianbo Ma, Yuee Zhou, Jianlun Ma, Guo Yu, Qing Li
- Abstract summary: Weight quantization is an effective technique to compress deep neural networks for their deployment on edge devices with limited resources.
Traditional loss-aware quantization methods commonly use the quantized gradient to replace the full-precision gradient.
This paper proposes a one-step forward and backtrack way for loss-aware quantization to get more accurate and stable gradient direction.
- Score: 12.400950982075948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weight quantization is an effective technique to compress deep neural
networks for their deployment on edge devices with limited resources.
Traditional loss-aware quantization methods commonly use the quantized gradient
to replace the full-precision gradient. However, we discover that the gradient
error will lead to an unexpected zig-zagging-like issue in the gradient descent
learning procedures, where the gradient directions rapidly oscillate or
zig-zag, and such issue seriously slows down the model convergence.
Accordingly, this paper proposes a one-step forward and backtrack way for
loss-aware quantization to get more accurate and stable gradient direction to
defy this issue. During the gradient descent learning, a one-step forward
search is designed to find the trial gradient of the next-step, which is
adopted to adjust the gradient of current step towards the direction of fast
convergence. After that, we backtrack the current step to update the
full-precision and quantized weights through the current-step gradient and the
trial gradient. A series of theoretical analysis and experiments on benchmark
deep models have demonstrated the effectiveness and competitiveness of the
proposed method, and our method especially outperforms others on the
convergence performance.
Related papers
- Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients [24.973203825917906]
We show that lowering the error for large-magnitude gradients boosts the quantization performance significantly.
We also introduce an interval update algorithm that adjusts the quantization interval adaptively to maintain a small quantization error for large gradients.
arXiv Detail & Related papers (2024-07-17T15:06:12Z) - Point Cloud Denoising via Momentum Ascent in Gradient Fields [72.93429911044903]
gradient-based method was proposed to estimate the gradient fields from the noisy point clouds using neural networks.
We develop a momentum gradient ascent method that leverages the information of previous iterations in determining the trajectories of the points.
Experiments demonstrate that the proposed method outperforms state-of-the-art approaches with a variety of point clouds, noise types, and noise levels.
arXiv Detail & Related papers (2022-02-21T10:21:40Z) - On Training Implicit Models [75.20173180996501]
We propose a novel gradient estimate for implicit models, named phantom gradient, that forgoes the costly computation of the exact gradient.
Experiments on large-scale tasks demonstrate that these lightweight phantom gradients significantly accelerate the backward passes in training implicit models by roughly 1.7 times.
arXiv Detail & Related papers (2021-11-09T14:40:24Z) - Adapting Stepsizes by Momentumized Gradients Improves Optimization and
Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
arXiv Detail & Related papers (2021-06-22T03:13:23Z) - Scaling transition from momentum stochastic gradient descent to plain
stochastic gradient descent [1.7874193862154875]
The momentum gradient descent uses the accumulated gradient as the updated direction of the current parameters.
The plain gradient descent has not been corrected by the accumulated gradient.
The TSGD algorithm has faster training speed, higher accuracy and better stability.
arXiv Detail & Related papers (2021-06-12T11:42:04Z) - Decreasing scaling transition from adaptive gradient descent to
stochastic gradient descent [1.7874193862154875]
We propose a decreasing scaling transition from adaptive gradient descent to gradient descent method DSTAda.
Our experimental results show that DSTAda has a faster speed, higher accuracy, and better stability and robustness.
arXiv Detail & Related papers (2021-06-12T11:28:58Z) - Channel-Directed Gradients for Optimization of Convolutional Neural
Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error.
We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z) - Variance Reduction for Deep Q-Learning using Stochastic Recursive
Gradient [51.880464915253924]
Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance.
This paper introduces the framework for updating the gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN.
arXiv Detail & Related papers (2020-07-25T00:54:20Z) - Regularizing Meta-Learning via Gradient Dropout [102.29924160341572]
meta-learning models are prone to overfitting when there are no sufficient training tasks for the meta-learners to generalize.
We introduce a simple yet effective method to alleviate the risk of overfitting for gradient-based meta-learning.
arXiv Detail & Related papers (2020-04-13T10:47:02Z) - Explore Aggressively, Update Conservatively: Stochastic Extragradient
Methods with Variable Stepsize Scaling [34.35013145885164]
Extragradient methods have become a staple for solving large-scale saddlepoint problems in machine learning.
We show in this paper that running vanilla extragradient with gradients may jeopardize its convergence, even in simple bilinear models.
We show that this modification allows the method to converge even with gradients, and we derive sharp convergence rates under an error bound condition.
arXiv Detail & Related papers (2020-03-23T10:24:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.