BiFSMNv2: Pushing Binary Neural Networks for Keyword Spotting to
Real-Network Performance
- URL: http://arxiv.org/abs/2211.06987v1
- Date: Sun, 13 Nov 2022 18:31:45 GMT
- Title: BiFSMNv2: Pushing Binary Neural Networks for Keyword Spotting to
Real-Network Performance
- Authors: Haotong Qin, Xudong Ma, Yifu Ding, Xiaoyang Li, Yang Zhang, Zejun Ma,
Jiakai Wang, Jie Luo, Xianglong Liu
- Abstract summary: Deep neural networks, such as the Deep-FSMN, have been widely studied for keyword spotting (KWS) applications.
We present a strong yet efficient binary neural network for KWS, namely BiFSMNv2, pushing it to the real-network accuracy performance.
We highlight that benefiting from the compact architecture and optimized hardware kernel, BiFSMNv2 can achieve an impressive 25.1x speedup and 20.2x storage-saving on edge hardware.
- Score: 54.214426436283134
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks, such as the Deep-FSMN, have been widely studied for
keyword spotting (KWS) applications while suffering expensive computation and
storage. Therefore, network compression technologies like binarization are
studied to deploy KWS models on edge. In this paper, we present a strong yet
efficient binary neural network for KWS, namely BiFSMNv2, pushing it to the
real-network accuracy performance. First, we present a Dual-scale Thinnable
1-bit-Architecture to recover the representation capability of the binarized
computation units by dual-scale activation binarization and liberate the
speedup potential from an overall architecture perspective. Second, we also
construct a Frequency Independent Distillation scheme for KWS
binarization-aware training, which distills the high and low-frequency
components independently to mitigate the information mismatch between
full-precision and binarized representations. Moreover, we implement BiFSMNv2
on ARMv8 real-world hardware with a novel Fast Bitwise Computation Kernel,
which is proposed to fully utilize registers and increase instruction
throughput. Comprehensive experiments show our BiFSMNv2 outperforms existing
binary networks for KWS by convincing margins across different datasets and
even achieves comparable accuracy with the full-precision networks (e.g., only
1.59% drop on Speech Commands V1-12). We highlight that benefiting from the
compact architecture and optimized hardware kernel, BiFSMNv2 can achieve an
impressive 25.1x speedup and 20.2x storage-saving on edge hardware.
Related papers
- Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectors [4.95475852994362]
We propose a new form of quantization to tile neural network layers with sequences of bits to achieve sub-bit compression of binary-weighted neural networks.
We employ the approach to both fully-connected and convolutional layers, which make up the breadth of space in most neural architectures.
arXiv Detail & Related papers (2024-07-16T15:55:38Z) - Input Layer Binarization with Bit-Plane Encoding [4.872439392746007]
We present a new method to binarize the first layer using directly the 8-bit representation of input data.
The resulting model is fully binarized and our first layer binarization approach is model independent.
arXiv Detail & Related papers (2023-05-04T14:49:07Z) - BiBench: Benchmarking and Analyzing Network Binarization [72.59760752906757]
Network binarization emerges as one of the most promising compression approaches offering extraordinary computation and memory savings.
Common challenges of binarization, such as accuracy degradation and efficiency limitation, suggest that its attributes are not fully understood.
We present BiBench, a rigorously designed benchmark with in-depth analysis for network binarization.
arXiv Detail & Related papers (2023-01-26T17:17:16Z) - BiFSMN: Binary Neural Network for Keyword Spotting [47.46397208920726]
BiFSMN is an accurate and extreme-efficient binary neural network for KWS.
We show that BiFSMN can achieve an impressive 22.3x speedup and 15.5x storage-saving on real-world edge hardware.
arXiv Detail & Related papers (2022-02-14T05:16:53Z) - High-Capacity Expert Binary Networks [56.87581500474093]
Network binarization is a promising hardware-aware direction for creating efficient deep models.
Despite its memory and computational advantages, reducing the accuracy gap between binary models and their real-valued counterparts remains an unsolved challenging research problem.
We propose Expert Binary Convolution, which, for the first time, tailors conditional computing to binary networks by learning to select one data-specific expert binary filter at a time conditioned on input features.
arXiv Detail & Related papers (2020-10-07T17:58:10Z) - SoFAr: Shortcut-based Fractal Architectures for Binary Convolutional
Neural Networks [7.753767947048147]
We propose two Shortcut-based Fractal Architectures (SoFAr) specifically designed for Binary Convolutional Neural Networks (BCNNs)
Our proposed SoFAr combines the adoption of shortcuts and the fractal architectures in one unified model, which is helpful in the training of BCNNs.
Results show that our proposed SoFAr achieves better accuracy compared with shortcut-based BCNNs.
arXiv Detail & Related papers (2020-09-11T10:00:47Z) - Training Binary Neural Networks with Real-to-Binary Convolutions [52.91164959767517]
We show how to train binary networks to within a few percent points of the full precision counterpart.
We show how to build a strong baseline, which already achieves state-of-the-art accuracy.
We show that, when putting all of our improvements together, the proposed model beats the current state of the art by more than 5% top-1 accuracy on ImageNet.
arXiv Detail & Related papers (2020-03-25T17:54:38Z) - ReActNet: Towards Precise Binary Neural Network with Generalized
Activation Functions [76.05981545084738]
We propose several ideas for enhancing a binary network to close its accuracy gap from real-valued networks without incurring any additional computational cost.
We first construct a baseline network by modifying and binarizing a compact real-valued network with parameter-free shortcuts.
We show that the proposed ReActNet outperforms all the state-of-the-arts by a large margin.
arXiv Detail & Related papers (2020-03-07T02:12:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.