Signed Binary Weight Networks
- URL: http://arxiv.org/abs/2211.13838v3
- Date: Tue, 5 Dec 2023 03:55:38 GMT
- Title: Signed Binary Weight Networks
- Authors: Sachit Kuhar, Alexey Tumanov, Judy Hoffman
- Abstract summary: Two important algorithmic techniques have shown promise for enabling efficient inference - sparsity and binarization.
We propose a new method called signed-binary networks to improve efficiency further.
Our method achieves comparable accuracy on ImageNet and CIFAR10 datasets with binary and can lead to 69% sparsity.
- Score: 17.07866119979333
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Efficient inference of Deep Neural Networks (DNNs) is essential to making AI
ubiquitous. Two important algorithmic techniques have shown promise for
enabling efficient inference - sparsity and binarization. These techniques
translate into weight sparsity and weight repetition at the hardware-software
level enabling the deployment of DNNs with critically low power and latency
requirements. We propose a new method called signed-binary networks to improve
efficiency further (by exploiting both weight sparsity and weight repetition
together) while maintaining similar accuracy. Our method achieves comparable
accuracy on ImageNet and CIFAR10 datasets with binary and can lead to 69%
sparsity. We observe real speedup when deploying these models on
general-purpose devices and show that this high percentage of unstructured
sparsity can lead to a further reduction in energy consumption on ASICs.
Related papers
- Signed Binarization: Unlocking Efficiency Through Repetition-Sparsity
Trade-Off [2.6144163646666945]
This paper introduces the concept of repetition-sparsity trade-off that helps explain computational efficiency during inference.
We propose Signed Binarization, a unified co-design framework that integrates hardware-software systems, quantization functions, and representation learning techniques to address this trade-off.
Our approach achieves a 26% speedup on real hardware, doubles energy efficiency, and reduces density by 2.8x compared to binary methods for ResNet 18.
arXiv Detail & Related papers (2023-12-04T02:33:53Z) - Dynamic Early Exiting Predictive Coding Neural Networks [3.542013483233133]
With the urge for smaller and more accurate devices, Deep Learning models became too heavy to deploy.
We propose a shallow bidirectional network based on predictive coding theory and dynamic early exiting for halting further computations.
We achieve comparable accuracy to VGG-16 in image classification on CIFAR-10 with fewer parameters and less computational complexity.
arXiv Detail & Related papers (2023-09-05T08:00:01Z) - BiFSMNv2: Pushing Binary Neural Networks for Keyword Spotting to
Real-Network Performance [54.214426436283134]
Deep neural networks, such as the Deep-FSMN, have been widely studied for keyword spotting (KWS) applications.
We present a strong yet efficient binary neural network for KWS, namely BiFSMNv2, pushing it to the real-network accuracy performance.
We highlight that benefiting from the compact architecture and optimized hardware kernel, BiFSMNv2 can achieve an impressive 25.1x speedup and 20.2x storage-saving on edge hardware.
arXiv Detail & Related papers (2022-11-13T18:31:45Z) - Energy Efficient Hardware Acceleration of Neural Networks with
Power-of-Two Quantisation [0.0]
We show that a hardware neural network accelerator with PoT weights implemented on the Zynq UltraScale + MPSoC ZCU104 FPGA can be at least $1.4x$ more energy efficient than the uniform quantisation version.
arXiv Detail & Related papers (2022-09-30T06:33:40Z) - BiFSMN: Binary Neural Network for Keyword Spotting [47.46397208920726]
BiFSMN is an accurate and extreme-efficient binary neural network for KWS.
We show that BiFSMN can achieve an impressive 22.3x speedup and 15.5x storage-saving on real-world edge hardware.
arXiv Detail & Related papers (2022-02-14T05:16:53Z) - Two Sparsities Are Better Than One: Unlocking the Performance Benefits
of Sparse-Sparse Networks [0.0]
We introduce Complementary Sparsity, a technique that significantly improves the performance of dual sparse networks on existing hardware.
We show up to 100X improvement in throughput and energy efficiency performing inference on FPGAs.
Our results suggest that weight plus activation sparsity can be a potent combination for efficiently scaling future AI models.
arXiv Detail & Related papers (2021-12-27T20:41:01Z) - Distribution-sensitive Information Retention for Accurate Binary Neural
Network [49.971345958676196]
We present a novel Distribution-sensitive Information Retention Network (DIR-Net) to retain the information of the forward activations and backward gradients.
Our DIR-Net consistently outperforms the SOTA binarization approaches under mainstream and compact architectures.
We conduct our DIR-Net on real-world resource-limited devices which achieves 11.1 times storage saving and 5.4 times speedup.
arXiv Detail & Related papers (2021-09-25T10:59:39Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and
Training [82.35376405568975]
Deep neural networks (DNNs) come with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage.
We present SmartDeal (SD), an algorithm framework to trade higher-cost memory storage/access for lower-cost computation.
We show that SD leads to 10.56x and 4.48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.
arXiv Detail & Related papers (2021-01-04T18:54:07Z) - ShiftAddNet: A Hardware-Inspired Deep Network [87.18216601210763]
ShiftAddNet is an energy-efficient multiplication-less deep neural network.
It leads to both energy-efficient inference and training, without compromising expressive capacity.
ShiftAddNet aggressively reduces over 80% hardware-quantified energy cost of DNNs training and inference, while offering comparable or better accuracies.
arXiv Detail & Related papers (2020-10-24T05:09:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.