ReCU: Reviving the Dead Weights in Binary Neural Networks
- URL: http://arxiv.org/abs/2103.12369v1
- Date: Tue, 23 Mar 2021 08:11:20 GMT
- Title: ReCU: Reviving the Dead Weights in Binary Neural Networks
- Authors: Zihan Xu, Mingbao Lin, Jianzhuang Liu, Jie Chen, Ling Shao, Yue Gao,
Yonghong Tian, Rongrong Ji
- Abstract summary: We explore the influence of "dead weights" which refer to a group of weights that are barely updated during the training of BNNs.
We prove that reviving the "dead weights" by ReCU can result in a smaller quantization error.
Our method offers not only faster BNN training, but also state-of-the-art performance on CIFAR-10 and ImageNet.
- Score: 153.6789340484509
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Binary neural networks (BNNs) have received increasing attention due to their
superior reductions of computation and memory. Most existing works focus on
either lessening the quantization error by minimizing the gap between the
full-precision weights and their binarization or designing a gradient
approximation to mitigate the gradient mismatch, while leaving the "dead
weights" untouched. This leads to slow convergence when training BNNs. In this
paper, for the first time, we explore the influence of "dead weights" which
refer to a group of weights that are barely updated during the training of
BNNs, and then introduce rectified clamp unit (ReCU) to revive the "dead
weights" for updating. We prove that reviving the "dead weights" by ReCU can
result in a smaller quantization error. Besides, we also take into account the
information entropy of the weights, and then mathematically analyze why the
weight standardization can benefit BNNs. We demonstrate the inherent
contradiction between minimizing the quantization error and maximizing the
information entropy, and then propose an adaptive exponential scheduler to
identify the range of the "dead weights". By considering the "dead weights",
our method offers not only faster BNN training, but also state-of-the-art
performance on CIFAR-10 and ImageNet, compared with recent methods. Code can be
available at [this https URL](https://github.com/z-hXu/ReCU).
Related papers
- Efficient Training with Denoised Neural Weights [65.14892033932895]
This work takes a novel step towards building a weight generator to synthesize the neural weights for initialization.
We use the image-to-image translation task with generative adversarial networks (GANs) as an example due to the ease of collecting model weights.
By initializing the image translation model with the denoised weights predicted by our diffusion model, the training requires only 43.3 seconds.
arXiv Detail & Related papers (2024-07-16T17:59:42Z) - OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks [19.41917323210239]
We investigate the efficiency of weight sign updates in Binary Neural Networks(BNNs)
For vanilla BNNs, over 50% of the weights remain their signs unchanged during training.
We propose Overcome Silent Weights(OvSW) to address the issue.
arXiv Detail & Related papers (2024-07-07T05:01:20Z) - Weight Compander: A Simple Weight Reparameterization for Regularization [5.744133015573047]
We introduce weight compander, a novel effective method to improve generalization of deep neural networks.
We show experimentally that using weight compander in addition to standard regularization methods improves the performance of neural networks.
arXiv Detail & Related papers (2023-06-29T14:52:04Z) - InRank: Incremental Low-Rank Learning [85.6380047359139]
gradient-based training implicitly regularizes neural networks towards low-rank solutions through a gradual increase of the rank during training.
Existing training algorithms do not exploit the low-rank property to improve computational efficiency.
We design a new training algorithm Incremental Low-Rank Learning (InRank), which explicitly expresses cumulative weight updates as low-rank matrices.
arXiv Detail & Related papers (2023-06-20T03:03:04Z) - Resilient Binary Neural Network [26.63280603795981]
We introduce a Resilient Binary Neural Network (ReBNN) to mitigate the frequent oscillation for better BNNs' training.
Our ReBNN achieves 66.9% Top-1 accuracy with ResNet-18 backbone on the ImageNet dataset.
arXiv Detail & Related papers (2023-02-02T08:51:07Z) - Long-Tailed Recognition via Weight Balancing [66.03068252811993]
Naive training produces models that are biased toward common classes in terms of higher accuracy.
We investigate three techniques to balance weights, L2-normalization, weight decay, and MaxNorm.
Our approach achieves the state-of-the-art accuracy on five standard benchmarks.
arXiv Detail & Related papers (2022-03-27T03:26:31Z) - SiMaN: Sign-to-Magnitude Network Binarization [165.5630656849309]
We show that our weight binarization provides an analytical solution by encoding high-magnitude weights into +1s, and 0s otherwise.
We prove that the learned weights of binarized networks roughly follow a Laplacian distribution that does not allow entropy.
Our method, dubbed sign-to- neural network binarization (SiMaN), is evaluated on CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2021-02-16T07:03:51Z) - Direct Quantization for Training Highly Accurate Low Bit-width Deep
Neural Networks [73.29587731448345]
This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations.
First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights.
Second, to obtain low bit-width activations, existing works consider all channels equally.
arXiv Detail & Related papers (2020-12-26T15:21:18Z) - FU-net: Multi-class Image Segmentation Using Feedback Weighted U-net [5.193724835939252]
We present a generic deep convolutional neural network (DCNN) for multi-class image segmentation.
It is based on a well-established supervised end-to-end DCNN model, known as U-net.
arXiv Detail & Related papers (2020-04-28T13:08:14Z) - Train-by-Reconnect: Decoupling Locations of Weights from their Values [6.09170287691728]
We show that untrained deep neural networks (DNNs) are different from trained ones.
We propose a novel method named Lookahead Permutation (LaPerm) to train DNNs by reconnecting the weights.
When the initial weights share a single value, our method finds weight neural network with far better-than-chance accuracy.
arXiv Detail & Related papers (2020-03-05T12:40:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.