Estimator Meets Equilibrium Perspective: A Rectified Straight Through
Estimator for Binary Neural Networks Training
- URL: http://arxiv.org/abs/2308.06689v2
- Date: Fri, 25 Aug 2023 13:51:25 GMT
- Title: Estimator Meets Equilibrium Perspective: A Rectified Straight Through
Estimator for Binary Neural Networks Training
- Authors: Xiao-Ming Wu, Dian Zheng, Zuhao Liu, Wei-Shi Zheng
- Abstract summary: Binarization of neural networks is a dominant paradigm in neural networks compression.
We propose Rectified Straight Through Estimator (ReSTE) to balance the estimating error and the gradient stability.
ReSTE has excellent performance and surpasses the state-of-the-art methods without any auxiliary modules or losses.
- Score: 35.090598013305275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Binarization of neural networks is a dominant paradigm in neural networks
compression. The pioneering work BinaryConnect uses Straight Through Estimator
(STE) to mimic the gradients of the sign function, but it also causes the
crucial inconsistency problem. Most of the previous methods design different
estimators instead of STE to mitigate it. However, they ignore the fact that
when reducing the estimating error, the gradient stability will decrease
concomitantly. These highly divergent gradients will harm the model training
and increase the risk of gradient vanishing and gradient exploding. To fully
take the gradient stability into consideration, we present a new perspective to
the BNNs training, regarding it as the equilibrium between the estimating error
and the gradient stability. In this view, we firstly design two indicators to
quantitatively demonstrate the equilibrium phenomenon. In addition, in order to
balance the estimating error and the gradient stability well, we revise the
original straight through estimator and propose a power function based
estimator, Rectified Straight Through Estimator (ReSTE for short). Comparing to
other estimators, ReSTE is rational and capable of flexibly balancing the
estimating error with the gradient stability. Extensive experiments on CIFAR-10
and ImageNet datasets show that ReSTE has excellent performance and surpasses
the state-of-the-art methods without any auxiliary modules or losses.
Related papers
- The Equalization Losses: Gradient-Driven Training for Long-tailed Object
Recognition [84.51875325962061]
We propose a gradient-driven training mechanism to tackle the long-tail problem.
We introduce a new family of gradient-driven loss functions, namely equalization losses.
Our method consistently outperforms the baseline models.
arXiv Detail & Related papers (2022-10-11T16:00:36Z) - Deep Equilibrium Optical Flow Estimation [80.80992684796566]
Recent state-of-the-art (SOTA) optical flow models use finite-step recurrent update operations to emulate traditional algorithms.
These RNNs impose large computation and memory overheads, and are not directly trained to model such stable estimation.
We propose deep equilibrium (DEQ) flow estimators, an approach that directly solves for the flow as the infinite-level fixed point of an implicit layer.
arXiv Detail & Related papers (2022-04-18T17:53:44Z) - Error-Correcting Neural Networks for Two-Dimensional Curvature
Computation in the Level-Set Method [0.0]
We present an error-neural-modeling-based strategy for approximating two-dimensional curvature in the level-set method.
Our main contribution is a redesigned hybrid solver that relies on numerical schemes to enable machine-learning operations on demand.
arXiv Detail & Related papers (2022-01-22T05:14:40Z) - Coupled Gradient Estimators for Discrete Latent Variables [41.428359609999326]
Training models with discrete latent variables is challenging due to the high variance of unbiased gradient estimators.
We introduce a novel derivation of their estimator based on importance sampling and statistical couplings.
We show that our proposed categorical gradient estimators provide state-of-the-art performance.
arXiv Detail & Related papers (2021-06-15T11:28:44Z) - Robust Learning via Persistency of Excitation [4.674053902991301]
We show that network training using gradient descent is equivalent to a dynamical system parameter estimation problem.
We provide an efficient technique for estimating the corresponding Lipschitz constant using extreme value theory.
Our approach also universally increases the adversarial accuracy by 0.1% to 0.3% points in various state-of-the-art adversarially trained models.
arXiv Detail & Related papers (2021-06-03T18:49:05Z) - Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient
Estimator [93.05919133288161]
We show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization.
This provably reduces the mean squared error.
We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.
arXiv Detail & Related papers (2020-10-09T22:54:38Z) - Unbiased Risk Estimators Can Mislead: A Case Study of Learning with
Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels.
We show that the quality of gradient estimation matters more in risk minimization.
We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z) - Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing
its Gradient Estimator Bias [65.13042449121411]
In practice, training a network with the gradient estimates provided by EP does not scale to visual tasks harder than MNIST.
We show that a bias in the gradient estimate of EP, inherent in the use of finite nudging, is responsible for this phenomenon.
We apply these techniques to train an architecture with asymmetric forward and backward connections, yielding a 13.2% test error.
arXiv Detail & Related papers (2020-06-06T09:36:07Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.