How Do Adam and Training Strategies Help BNNs Optimization?
- URL: http://arxiv.org/abs/2106.11309v1
- Date: Mon, 21 Jun 2021 17:59:51 GMT
- Title: How Do Adam and Training Strategies Help BNNs Optimization?
- Authors: Zechun Liu, Zhiqiang Shen, Shichao Li, Koen Helwegen, Dong Huang,
Kwang-Ting Cheng
- Abstract summary: We show that Adam is better equipped to handle the rugged loss surface of BNNs and reaches a better optimum with higher generalization ability.
We derive a simple training scheme, building on existing Adam-based optimization, which achieves 70.5% top-1 accuracy on the ImageNet dataset.
- Score: 50.22482900678071
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The best performing Binary Neural Networks (BNNs) are usually attained using
Adam optimization and its multi-step training variants. However, to the best of
our knowledge, few studies explore the fundamental reasons why Adam is superior
to other optimizers like SGD for BNN optimization or provide analytical
explanations that support specific training strategies. To address this, in
this paper we first investigate the trajectories of gradients and weights in
BNNs during the training process. We show the regularization effect of
second-order momentum in Adam is crucial to revitalize the weights that are
dead due to the activation saturation in BNNs. We find that Adam, through its
adaptive learning rate strategy, is better equipped to handle the rugged loss
surface of BNNs and reaches a better optimum with higher generalization
ability. Furthermore, we inspect the intriguing role of the real-valued weights
in binary networks, and reveal the effect of weight decay on the stability and
sluggishness of BNN optimization. Through extensive experiments and analysis,
we derive a simple training scheme, building on existing Adam-based
optimization, which achieves 70.5% top-1 accuracy on the ImageNet dataset using
the same architecture as the state-of-the-art ReActNet while achieving 1.1%
higher accuracy. Code and models are available at
https://github.com/liuzechun/AdamBNN.
Related papers
- Variational Learning is Effective for Large Deep Networks [76.94351631300788]
We show that an Improved Variational Online Newton consistently matches or outperforms Adam for training large networks.
IVON's computational costs are nearly identical to Adam but its predictive uncertainty is better.
We find overwhelming evidence that variational learning is effective.
arXiv Detail & Related papers (2024-02-27T16:11:05Z) - Weight Prediction Boosts the Convergence of AdamW [3.7485728774744556]
We introduce weight prediction into the AdamW to boost its convergence when training the deep neural network (DNN) models.
In particular, ahead of each mini-batch training, we predict the future weights according to the update rule of AdamW and then apply the predicted future weights.
arXiv Detail & Related papers (2023-02-01T02:58:29Z) - Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z) - Spatial-Temporal-Fusion BNN: Variational Bayesian Feature Layer [77.78479877473899]
We design a spatial-temporal-fusion BNN for efficiently scaling BNNs to large models.
Compared to vanilla BNNs, our approach can greatly reduce the training time and the number of parameters, which contributes to scale BNNs efficiently.
arXiv Detail & Related papers (2021-12-12T17:13:14Z) - "BNN - BN = ?": Training Binary Neural Networks without Batch
Normalization [92.23297927690149]
Batch normalization (BN) is a key facilitator and considered essential for state-of-the-art binary neural networks (BNN)
We extend their framework to training BNNs, and for the first time demonstrate that BNs can be completed removed from BNN training and inference regimes.
arXiv Detail & Related papers (2021-04-16T16:46:57Z) - A Bop and Beyond: A Second Order Optimizer for Binarized Neural Networks [0.0]
optimization of Binary Neural Networks (BNNs) relies on approximating the real-valued weights with their binarized representations.
In this paper, we take an approach parallel to Adam which also uses the second raw moment estimate to normalize the first raw moment before doing the comparison with the threshold.
We present two versions of the proposed: a biased one and a bias-corrected one, each with its own applications.
arXiv Detail & Related papers (2021-04-11T22:20:09Z) - S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural
Networks via Guided Distribution Calibration [74.5509794733707]
We present a novel guided learning paradigm from real-valued to distill binary networks on the final prediction distribution.
Our proposed method can boost the simple contrastive learning baseline by an absolute gain of 5.515% on BNNs.
Our method achieves substantial improvement over the simple contrastive learning baseline, and is even comparable to many mainstream supervised BNN methods.
arXiv Detail & Related papers (2021-02-17T18:59:28Z) - FTBNN: Rethinking Non-linearity for 1-bit CNNs and Going Beyond [23.5996182207431]
We show that binarized convolution process owns an increasing linearity towards the target of minimizing such error, which in turn hampers BNN's discriminative ability.
We re-investigate and tune proper non-linear modules to fix that contradiction, leading to a strong baseline which achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-10-19T08:11:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.