Optimizing data-flow in Binary Neural Networks
- URL: http://arxiv.org/abs/2304.00952v1
- Date: Mon, 3 Apr 2023 13:16:33 GMT
- Title: Optimizing data-flow in Binary Neural Networks
- Authors: L. Vorabbi, D. Maltoni, S. Santi
- Abstract summary: We propose a novel training scheme that can increase data flow and parallelism in the BNN pipeline.
We also present an optimized implementation of the Binary Direct Convolution for ARM instruction sets.
Our experiments show a consistent improvement of the inference speed (up to 1.91 and 2.73x compared to two state-of-the-art BNNs frameworks) with no drop in accuracy for at least one full-precision model.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Binary Neural Networks (BNNs) can significantly accelerate the inference time
of a neural network by replacing its expensive floating-point arithmetic with
bitwise operations. Most existing solutions, however, do not fully optimize
data flow through the BNN layers, and intermediate conversions from 1 to 16/32
bits often further hinder efficiency. We propose a novel training scheme that
can increase data flow and parallelism in the BNN pipeline; specifically, we
introduce a clipping block that decreases the data-width from 32 bits to 8.
Furthermore, we reduce the internal accumulator size of a binary layer, usually
kept using 32-bit to prevent data overflow without losing accuracy.
Additionally, we provide an optimization of the Batch Normalization layer that
both reduces latency and simplifies deployment. Finally, we present an
optimized implementation of the Binary Direct Convolution for ARM instruction
sets. Our experiments show a consistent improvement of the inference speed (up
to 1.91 and 2.73x compared to two state-of-the-art BNNs frameworks) with no
drop in accuracy for at least one full-precision model.
Related papers
- BBS: Bi-directional Bit-level Sparsity for Deep Learning Acceleration [9.092712730883887]
Bit-level sparsity methods skip ineffectual zero-bit operations and are typically applicable within bit-serial deep learning accelerators.
In this work, we improve the practicality and efficiency of bitlevel sparsity through a novel algorithmic bit-pruning, averaging, and compression method.
On the hardware side, we demonstrate the potential of BBS through BitVert, a bitserial architecture with an efficient PE design to accelerate DNNs with low overhead.
arXiv Detail & Related papers (2024-09-08T21:45:12Z) - Compacting Binary Neural Networks by Sparse Kernel Selection [58.84313343190488]
This paper is motivated by a previously revealed phenomenon that the binary kernels in successful BNNs are nearly power-law distributed.
We develop the Permutation Straight-Through Estimator (PSTE) that is able to not only optimize the selection process end-to-end but also maintain the non-repetitive occupancy of selected codewords.
Experiments verify that our method reduces both the model size and bit-wise computational costs, and achieves accuracy improvements compared with state-of-the-art BNNs under comparable budgets.
arXiv Detail & Related papers (2023-03-25T13:53:02Z) - Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z) - Sub-bit Neural Networks: Learning to Compress and Accelerate Binary
Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs.
SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space.
Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z) - 8-bit Optimizers via Block-wise Quantization [57.25800395197516]
Statefuls maintain statistics over time, e.g., the exponentially smoothed sum (SGD with momentum) or squared sum (Adam) of past values.
This state can be used to accelerate optimization compared to plain gradient descent but uses memory that might otherwise be allocated to model parameters.
In this paper, we develop first gradients that use 8-bit statistics while maintaining the performance levels of using 32-bit gradient states.
arXiv Detail & Related papers (2021-10-06T15:43:20Z) - Distribution-sensitive Information Retention for Accurate Binary Neural
Network [49.971345958676196]
We present a novel Distribution-sensitive Information Retention Network (DIR-Net) to retain the information of the forward activations and backward gradients.
Our DIR-Net consistently outperforms the SOTA binarization approaches under mainstream and compact architectures.
We conduct our DIR-Net on real-world resource-limited devices which achieves 11.1 times storage saving and 5.4 times speedup.
arXiv Detail & Related papers (2021-09-25T10:59:39Z) - FTBNN: Rethinking Non-linearity for 1-bit CNNs and Going Beyond [23.5996182207431]
We show that binarized convolution process owns an increasing linearity towards the target of minimizing such error, which in turn hampers BNN's discriminative ability.
We re-investigate and tune proper non-linear modules to fix that contradiction, leading to a strong baseline which achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-10-19T08:11:48Z) - SoFAr: Shortcut-based Fractal Architectures for Binary Convolutional
Neural Networks [7.753767947048147]
We propose two Shortcut-based Fractal Architectures (SoFAr) specifically designed for Binary Convolutional Neural Networks (BCNNs)
Our proposed SoFAr combines the adoption of shortcuts and the fractal architectures in one unified model, which is helpful in the training of BCNNs.
Results show that our proposed SoFAr achieves better accuracy compared with shortcut-based BCNNs.
arXiv Detail & Related papers (2020-09-11T10:00:47Z) - Distillation Guided Residual Learning for Binary Convolutional Neural
Networks [83.6169936912264]
It is challenging to bridge the performance gap between Binary CNN (BCNN) and Floating point CNN (FCNN)
We observe that, this performance gap leads to substantial residuals between intermediate feature maps of BCNN and FCNN.
To minimize the performance gap, we enforce BCNN to produce similar intermediate feature maps with the ones of FCNN.
This training strategy, i.e., optimizing each binary convolutional block with block-wise distillation loss derived from FCNN, leads to a more effective optimization to BCNN.
arXiv Detail & Related papers (2020-07-10T07:55:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.