Related papers: Two Sparsities Are Better Than One: Unlocking the Performance Benefits of Sparse-Sparse Networks

Two Sparsities Are Better Than One: Unlocking the Performance Benefits of Sparse-Sparse Networks

URL: http://arxiv.org/abs/2112.13896v1
Date: Mon, 27 Dec 2021 20:41:01 GMT
Title: Two Sparsities Are Better Than One: Unlocking the Performance Benefits of Sparse-Sparse Networks
Authors: Kevin Lee Hunter, Lawrence Spracklen and Subutai Ahmad
Abstract summary: We introduce Complementary Sparsity, a technique that significantly improves the performance of dual sparse networks on existing hardware. We show up to 100X improvement in throughput and energy efficiency performing inference on FPGAs. Our results suggest that weight plus activation sparsity can be a potent combination for efficiently scaling future AI models.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In principle, sparse neural networks should be significantly more efficient than traditional dense networks. Neurons in the brain exhibit two types of sparsity; they are sparsely interconnected and sparsely active. These two types of sparsity, called weight sparsity and activation sparsity, when combined, offer the potential to reduce the computational cost of neural networks by two orders of magnitude. Despite this potential, today's neural networks deliver only modest performance benefits using just weight sparsity, because traditional computing hardware cannot efficiently process sparse networks. In this article we introduce Complementary Sparsity, a novel technique that significantly improves the performance of dual sparse networks on existing hardware. We demonstrate that we can achieve high performance running weight-sparse networks, and we can multiply those speedups by incorporating activation sparsity. Using Complementary Sparsity, we show up to 100X improvement in throughput and energy efficiency performing inference on FPGAs. We analyze scalability and resource tradeoffs for a variety of kernels typical of commercial convolutional networks such as ResNet-50 and MobileNetV2. Our results with Complementary Sparsity suggest that weight plus activation sparsity can be a potent combination for efficiently scaling future AI models.

Related papers

Signed Binary Weight Networks [17.07866119979333]
Two important algorithmic techniques have shown promise for enabling efficient inference - sparsity and binarization. We propose a new method called signed-binary networks to improve efficiency further. Our method achieves comparable accuracy on ImageNet and CIFAR10 datasets with binary and can lead to 69% sparsity.
arXiv Detail & Related papers (2022-11-25T00:19:21Z)
Energy Efficient Hardware Acceleration of Neural Networks with Power-of-Two Quantisation [0.0]
We show that a hardware neural network accelerator with PoT weights implemented on the Zynq UltraScale + MPSoC ZCU104 FPGA can be at least $1.4x$ more energy efficient than the uniform quantisation version.
arXiv Detail & Related papers (2022-09-30T06:33:40Z)
ShiftAddNAS: Hardware-Inspired Search for More Accurate and Efficient Neural Networks [42.28659737268829]
ShiftAddNAS can automatically search for more accurate and more efficient NNs. ShiftAddNAS integrates the first hybrid search space that incorporates both multiplication-based and multiplication-free operators. Experiments and ablation studies consistently validate the efficacy of ShiftAddNAS.
arXiv Detail & Related papers (2022-05-17T06:40:13Z)
SONIC: A Sparse Neural Network Inference Accelerator with Silicon Photonics for Energy-Efficient Deep Learning [4.286327408435937]
We propose a novel silicon photonics-based sparse neural network inference accelerator called SONIC. SONIC can achieve up to 5.8x better performance-per-watt and 8.4x lower energy-per-bit than state-of-the-art sparse electronic neural network accelerators.
arXiv Detail & Related papers (2021-09-09T17:57:09Z)
S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN Acceleration [21.110711058376534]
Exploiting sparsity is a key technique in accelerating quantized convolutional neural network (CNN) inference on mobile devices. We propose to exploit structured sparsity, more specifically, Density Bound Block (DBB) sparsity for both weights and activations. We describe S2TA, a systolic array-based CNN accelerator that exploits joint weight and activation DBB sparsity.
arXiv Detail & Related papers (2021-07-16T15:57:06Z)
FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training with Dynamic Sparsity [74.58777701536668]
We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin. We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
arXiv Detail & Related papers (2021-06-28T10:48:20Z)
CondenseNet V2: Sparse Feature Reactivation for Deep Networks [87.38447745642479]
Reusing features in deep networks through dense connectivity is an effective way to achieve high computational efficiency. We propose an alternative approach named sparse feature reactivation (SFR), aiming at actively increasing the utility of features for reusing. Our experiments show that the proposed models achieve promising performance on image classification (ImageNet and CIFAR) and object detection (MS COCO) in terms of both theoretical efficiency and practical speed.
arXiv Detail & Related papers (2021-04-09T14:12:43Z)
Dynamic Slimmable Network [105.74546828182834]
We develop a dynamic network slimming regime named Dynamic Slimmable Network (DS-Net) Our DS-Net is empowered with the ability of dynamic inference by the proposed double-headed dynamic gate. It consistently outperforms its static counterparts as well as state-of-the-art static and dynamic model compression methods.
arXiv Detail & Related papers (2021-03-24T15:25:20Z)
Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments. In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z)
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks [78.47459801017959]
Sparsity can reduce the memory footprint of regular networks to fit mobile devices. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice.
arXiv Detail & Related papers (2021-01-31T22:48:50Z)
ShiftAddNet: A Hardware-Inspired Deep Network [87.18216601210763]
ShiftAddNet is an energy-efficient multiplication-less deep neural network. It leads to both energy-efficient inference and training, without compromising expressive capacity. ShiftAddNet aggressively reduces over 80% hardware-quantified energy cost of DNNs training and inference, while offering comparable or better accuracies.
arXiv Detail & Related papers (2020-10-24T05:09:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.