FATNN: Fast and Accurate Ternary Neural Networks
- URL: http://arxiv.org/abs/2008.05101v4
- Date: Thu, 29 Jul 2021 11:50:10 GMT
- Title: FATNN: Fast and Accurate Ternary Neural Networks
- Authors: Peng Chen, Bohan Zhuang, Chunhua Shen
- Abstract summary: Ternary Neural Networks (TNNs) have received much attention due to being potentially orders of magnitude faster in inference, as well as more power efficient, than full-precision counterparts.
In this work, we show that, under some mild constraints, computational complexity of the ternary inner product can be reduced by a factor of 2.
We elaborately design an implementation-dependent ternary quantization algorithm to mitigate the performance gap.
- Score: 89.07796377047619
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ternary Neural Networks (TNNs) have received much attention due to being
potentially orders of magnitude faster in inference, as well as more power
efficient, than full-precision counterparts. However, 2 bits are required to
encode the ternary representation with only 3 quantization levels leveraged. As
a result, conventional TNNs have similar memory consumption and speed compared
with the standard 2-bit models, but have worse representational capability.
Moreover, there is still a significant gap in accuracy between TNNs and
full-precision networks, hampering their deployment to real applications. To
tackle these two challenges, in this work, we first show that, under some mild
constraints, computational complexity of the ternary inner product can be
reduced by a factor of 2. Second, to mitigate the performance gap, we
elaborately design an implementation-dependent ternary quantization algorithm.
The proposed framework is termed Fast and Accurate Ternary Neural Networks
(FATNN). Experiments on image classification demonstrate that our FATNN
surpasses the state-of-the-arts by a significant margin in accuracy. More
importantly, speedup evaluation compared with various precisions is analyzed on
several platforms, which serves as a strong benchmark for further research.
Related papers
- Noise Adaptor in Spiking Neural Networks [4.568827262994048]
Low-latency spiking neural network (SNN) algorithms have drawn significant interest.
One of the most efficient ways to construct a low-latency SNN is by converting a pre-trained, low-bit artificial neural network (ANN) into an SNN.
converting SNNs from low-bit ANNs can lead to occasional noise" -- the phenomenon where occasional spikes are generated in spiking neurons where they should not be.
arXiv Detail & Related papers (2023-12-08T16:57:01Z) - QVIP: An ILP-based Formal Verification Approach for Quantized Neural
Networks [14.766917269393865]
Quantization has emerged as a promising technique to reduce the size of neural networks with comparable accuracy as their floating-point numbered counterparts.
We propose a novel and efficient formal verification approach for QNNs.
In particular, we are the first to propose an encoding that reduces the verification problem of QNNs into the solving of integer linear constraints.
arXiv Detail & Related papers (2022-12-10T03:00:29Z) - Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z) - Towards Lossless ANN-SNN Conversion under Ultra-Low Latency with Dual-Phase Optimization [30.098268054714048]
Spiking neural networks (SNNs) operating with asynchronous discrete events show higher energy efficiency with sparse computation.
A popular approach for implementing deep SNNs is ANN-SNN conversion combining both efficient training of ANNs and efficient inference of SNNs.
In this paper, we first identify that such performance degradation stems from the misrepresentation of the negative or overflow residual membrane potential in SNNs.
Inspired by this, we decompose the conversion error into three parts: quantization error, clipping error, and residual membrane potential representation error.
arXiv Detail & Related papers (2022-05-16T06:53:14Z) - Spatial-Temporal-Fusion BNN: Variational Bayesian Feature Layer [77.78479877473899]
We design a spatial-temporal-fusion BNN for efficiently scaling BNNs to large models.
Compared to vanilla BNNs, our approach can greatly reduce the training time and the number of parameters, which contributes to scale BNNs efficiently.
arXiv Detail & Related papers (2021-12-12T17:13:14Z) - Sub-bit Neural Networks: Learning to Compress and Accelerate Binary
Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs.
SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space.
Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z) - S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural
Networks via Guided Distribution Calibration [74.5509794733707]
We present a novel guided learning paradigm from real-valued to distill binary networks on the final prediction distribution.
Our proposed method can boost the simple contrastive learning baseline by an absolute gain of 5.515% on BNNs.
Our method achieves substantial improvement over the simple contrastive learning baseline, and is even comparable to many mainstream supervised BNN methods.
arXiv Detail & Related papers (2021-02-17T18:59:28Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.