"BNN - BN = ?": Training Binary Neural Networks without Batch
Normalization
- URL: http://arxiv.org/abs/2104.08215v1
- Date: Fri, 16 Apr 2021 16:46:57 GMT
- Title: "BNN - BN = ?": Training Binary Neural Networks without Batch
Normalization
- Authors: Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen,
Zhangyang Wang
- Abstract summary: Batch normalization (BN) is a key facilitator and considered essential for state-of-the-art binary neural networks (BNN)
We extend their framework to training BNNs, and for the first time demonstrate that BNs can be completed removed from BNN training and inference regimes.
- Score: 92.23297927690149
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Batch normalization (BN) is a key facilitator and considered essential for
state-of-the-art binary neural networks (BNN). However, the BN layer is costly
to calculate and is typically implemented with non-binary parameters, leaving a
hurdle for the efficient implementation of BNN training. It also introduces
undesirable dependence between samples within each batch. Inspired by the
latest advance on Batch Normalization Free (BN-Free) training, we extend their
framework to training BNNs, and for the first time demonstrate that BNs can be
completed removed from BNN training and inference regimes. By plugging in and
customizing techniques including adaptive gradient clipping, scale weight
standardization, and specialized bottleneck block, a BN-free BNN is capable of
maintaining competitive accuracy compared to its BN-based counterpart.
Extensive experiments validate the effectiveness of our proposal across diverse
BNN backbones and datasets. For example, after removing BNs from the
state-of-the-art ReActNets, it can still be trained with our proposed
methodology to achieve 92.08%, 68.34%, and 68.0% accuracy on CIFAR-10,
CIFAR-100, and ImageNet respectively, with marginal performance drop
(0.23%~0.44% on CIFAR and 1.40% on ImageNet). Codes and pre-trained models are
available at: https://github.com/VITA-Group/BNN_NoBN.
Related papers
- NAS-BNN: Neural Architecture Search for Binary Neural Networks [55.058512316210056]
We propose a novel neural architecture search scheme for binary neural networks, named NAS-BNN.
Our discovered binary model family outperforms previous BNNs for a wide range of operations (OPs) from 20M to 200M.
In addition, we validate the transferability of these searched BNNs on the object detection task, and our binary detectors with the searched BNNs achieve a novel state-of-the-art result, e.g., 31.6% mAP with 370M OPs, on MS dataset.
arXiv Detail & Related papers (2024-08-28T02:17:58Z) - Overcoming Recency Bias of Normalization Statistics in Continual
Learning: Balance and Adaptation [67.77048565738728]
Continual learning involves learning a sequence of tasks and balancing their knowledge appropriately.
We propose Adaptive Balance of BN (AdaB$2$N), which incorporates appropriately a Bayesian-based strategy to adapt task-wise contributions.
Our approach achieves significant performance gains across a wide range of benchmarks.
arXiv Detail & Related papers (2023-10-13T04:50:40Z) - Multi-Objective Linear Ensembles for Robust and Sparse Training of Few-Bit Neural Networks [5.246498560938275]
We study the case of few-bit discrete-valued neural networks, both Binarized Neural Networks (BNNs) and Neural Networks (INNs)
Our contribution is a multi-objective ensemble approach based on training a single NN for each possible pair of classes and applying a majority voting scheme to predict the final output.
We compare this BeMi approach to the current state-of-the-art in solver-based NN training and gradient-based training, focusing on BNN learning in few-shot contexts.
arXiv Detail & Related papers (2022-12-07T14:23:43Z) - An Adaptive Batch Normalization in Deep Learning [0.0]
Batch Normalization (BN) is a way to accelerate and stabilize training in deep convolutional neural networks.
We propose a threshold-based adaptive BN approach that separates the data that requires the BN and data that does not require it.
arXiv Detail & Related papers (2022-11-03T12:12:56Z) - Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z) - Diagnosing Batch Normalization in Class Incremental Learning [39.70552266952221]
Batch normalization (BN) standardizes intermediate feature maps and has been widely validated to improve training stability and convergence.
We propose BN Tricks to address the issue by training a better feature extractor while eliminating classification bias.
We show that BN Tricks can bring significant performance gains to all adopted baselines.
arXiv Detail & Related papers (2022-02-16T12:38:43Z) - Self-Distribution Binary Neural Networks [18.69165083747967]
We study the binary neural networks (BNNs) of which both the weights and activations are binary (i.e., 1-bit representation)
We propose Self-Distribution Binary Neural Network (SD-BNN)
Experiments on CIFAR-10 and ImageNet datasets show that the proposed SD-BNN consistently outperforms the state-of-the-art (SOTA) BNNs.
arXiv Detail & Related papers (2021-03-03T13:39:52Z) - S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural
Networks via Guided Distribution Calibration [74.5509794733707]
We present a novel guided learning paradigm from real-valued to distill binary networks on the final prediction distribution.
Our proposed method can boost the simple contrastive learning baseline by an absolute gain of 5.515% on BNNs.
Our method achieves substantial improvement over the simple contrastive learning baseline, and is even comparable to many mainstream supervised BNN methods.
arXiv Detail & Related papers (2021-02-17T18:59:28Z) - FTBNN: Rethinking Non-linearity for 1-bit CNNs and Going Beyond [23.5996182207431]
We show that binarized convolution process owns an increasing linearity towards the target of minimizing such error, which in turn hampers BNN's discriminative ability.
We re-investigate and tune proper non-linear modules to fix that contradiction, leading to a strong baseline which achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-10-19T08:11:48Z) - Distillation Guided Residual Learning for Binary Convolutional Neural
Networks [83.6169936912264]
It is challenging to bridge the performance gap between Binary CNN (BCNN) and Floating point CNN (FCNN)
We observe that, this performance gap leads to substantial residuals between intermediate feature maps of BCNN and FCNN.
To minimize the performance gap, we enforce BCNN to produce similar intermediate feature maps with the ones of FCNN.
This training strategy, i.e., optimizing each binary convolutional block with block-wise distillation loss derived from FCNN, leads to a more effective optimization to BCNN.
arXiv Detail & Related papers (2020-07-10T07:55:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.