Distillation Guided Residual Learning for Binary Convolutional Neural
Networks
- URL: http://arxiv.org/abs/2007.05223v2
- Date: Mon, 27 Jul 2020 01:21:29 GMT
- Title: Distillation Guided Residual Learning for Binary Convolutional Neural
Networks
- Authors: Jianming Ye, Shiliang Zhang, Jingdong Wang
- Abstract summary: It is challenging to bridge the performance gap between Binary CNN (BCNN) and Floating point CNN (FCNN)
We observe that, this performance gap leads to substantial residuals between intermediate feature maps of BCNN and FCNN.
To minimize the performance gap, we enforce BCNN to produce similar intermediate feature maps with the ones of FCNN.
This training strategy, i.e., optimizing each binary convolutional block with block-wise distillation loss derived from FCNN, leads to a more effective optimization to BCNN.
- Score: 83.6169936912264
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is challenging to bridge the performance gap between Binary CNN (BCNN) and
Floating point CNN (FCNN). We observe that, this performance gap leads to
substantial residuals between intermediate feature maps of BCNN and FCNN. To
minimize the performance gap, we enforce BCNN to produce similar intermediate
feature maps with the ones of FCNN. This training strategy, i.e., optimizing
each binary convolutional block with block-wise distillation loss derived from
FCNN, leads to a more effective optimization to BCNN. It also motivates us to
update the binary convolutional block architecture to facilitate the
optimization of block-wise distillation loss. Specifically, a lightweight
shortcut branch is inserted into each binary convolutional block to complement
residuals at each block. Benefited from its Squeeze-and-Interaction (SI)
structure, this shortcut branch introduces a fraction of parameters, e.g., 10\%
overheads, but effectively complements the residuals. Extensive experiments on
ImageNet demonstrate the superior performance of our method in both
classification efficiency and accuracy, e.g., BCNN trained with our methods
achieves the accuracy of 60.45\% on ImageNet.
Related papers
- Optimizing data-flow in Binary Neural Networks [0.0]
We propose a novel training scheme that can increase data flow and parallelism in the BNN pipeline.
We also present an optimized implementation of the Binary Direct Convolution for ARM instruction sets.
Our experiments show a consistent improvement of the inference speed (up to 1.91 and 2.73x compared to two state-of-the-art BNNs frameworks) with no drop in accuracy for at least one full-precision model.
arXiv Detail & Related papers (2023-04-03T13:16:33Z) - Compacting Binary Neural Networks by Sparse Kernel Selection [58.84313343190488]
This paper is motivated by a previously revealed phenomenon that the binary kernels in successful BNNs are nearly power-law distributed.
We develop the Permutation Straight-Through Estimator (PSTE) that is able to not only optimize the selection process end-to-end but also maintain the non-repetitive occupancy of selected codewords.
Experiments verify that our method reduces both the model size and bit-wise computational costs, and achieves accuracy improvements compared with state-of-the-art BNNs under comparable budgets.
arXiv Detail & Related papers (2023-03-25T13:53:02Z) - Basic Binary Convolution Unit for Binarized Image Restoration Network [146.0988597062618]
In this study, we reconsider components in binary convolution, such as residual connection, BatchNorm, activation function, and structure, for image restoration tasks.
Based on our findings and analyses, we design a simple yet efficient basic binary convolution unit (BBCU)
Our BBCU significantly outperforms other BNNs and lightweight models, which shows that BBCU can serve as a basic unit for binarized IR networks.
arXiv Detail & Related papers (2022-10-02T01:54:40Z) - Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z) - Elastic-Link for Binarized Neural Network [9.83865304744923]
"Elastic-Link" (EL) module enrich information flow within a BNN by adaptively adding real-valued input features to the subsequent convolutional output features.
EL produces a significant improvement on the challenging large-scale ImageNet dataset.
With the integration of ReActNet, it yields a new state-of-the-art result of 71.9% top-1 accuracy.
arXiv Detail & Related papers (2021-12-19T13:49:29Z) - Sub-bit Neural Networks: Learning to Compress and Accelerate Binary
Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs.
SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space.
Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z) - "BNN - BN = ?": Training Binary Neural Networks without Batch
Normalization [92.23297927690149]
Batch normalization (BN) is a key facilitator and considered essential for state-of-the-art binary neural networks (BNN)
We extend their framework to training BNNs, and for the first time demonstrate that BNs can be completed removed from BNN training and inference regimes.
arXiv Detail & Related papers (2021-04-16T16:46:57Z) - FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with
Fractional Activations [20.218382369944152]
Binary neural networks (BNNs) have 1-bit weights and activations.
BNNs tend to produce a much lower accuracy on realistic datasets such as ImageNet.
This work proposes FracBNN, which exploits fractional activations to substantially improve the accuracy of BNNs.
arXiv Detail & Related papers (2020-12-22T17:49:30Z) - FTBNN: Rethinking Non-linearity for 1-bit CNNs and Going Beyond [23.5996182207431]
We show that binarized convolution process owns an increasing linearity towards the target of minimizing such error, which in turn hampers BNN's discriminative ability.
We re-investigate and tune proper non-linear modules to fix that contradiction, leading to a strong baseline which achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-10-19T08:11:48Z) - SoFAr: Shortcut-based Fractal Architectures for Binary Convolutional
Neural Networks [7.753767947048147]
We propose two Shortcut-based Fractal Architectures (SoFAr) specifically designed for Binary Convolutional Neural Networks (BCNNs)
Our proposed SoFAr combines the adoption of shortcuts and the fractal architectures in one unified model, which is helpful in the training of BCNNs.
Results show that our proposed SoFAr achieves better accuracy compared with shortcut-based BCNNs.
arXiv Detail & Related papers (2020-09-11T10:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.