Dual-side Sparse Tensor Core
- URL: http://arxiv.org/abs/2105.09564v1
- Date: Thu, 20 May 2021 07:36:16 GMT
- Title: Dual-side Sparse Tensor Core
- Authors: Yang Wang, Chen Zhang, Zhiqiang Xie, Cong Guo, Yunxin Liu, Jingwen
Leng
- Abstract summary: Existing GPUs can only leverage the sparsity from weights but not activations, which are dynamic, unpredictable, and hence challenging to exploit.
We propose a novel architecture to efficiently harness the dual-side sparsity (i.e., weight and activation sparsity)
Our design can fully unleash the dual-side sparsity and improve the performance by up to one order of magnitude with hlsmall hardware overhead.
- Score: 18.204976918925635
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Leveraging sparsity in deep neural network (DNN) models is promising for
accelerating model inference. Yet existing GPUs can only leverage the sparsity
from weights but not activations, which are dynamic, unpredictable, and hence
challenging to exploit. In this work, we propose a novel architecture to
efficiently harness the dual-side sparsity (i.e., weight and activation
sparsity). We take a systematic approach to understand the (dis)advantages of
previous sparsity-related architectures and propose a novel, unexplored
paradigm that combines outer-product computation primitive and bitmap-based
encoding format. We demonstrate the feasibility of our design with minimal
changes to the existing production-scale inner-product-based Tensor Core. We
propose a set of novel ISA extensions and co-design the matrix-matrix
multiplication and convolution algorithms, which are the two dominant
computation patterns in today's DNN models, to exploit our new dual-side sparse
Tensor Core. Our evaluation shows that our design can fully unleash the
dual-side DNN sparsity and improve the performance by up to one order of
magnitude with \hl{small} hardware overhead.
Related papers
- Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z) - DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware
Efficiency of Compact Neural Networks [29.46621102184345]
We propose a framework dubbed DepthShrinker to develop hardware-friendly compact networks.
Our framework delivers hardware-friendly compact networks that outperform both state-of-the-art efficient DNNs and compression techniques.
arXiv Detail & Related papers (2022-06-02T02:32:47Z) - Accelerating Sparse Deep Neural Networks [20.6942347219753]
We present the design and behavior of Sparse Cores, which exploit a 2:4 (25%) sparsity pattern that leads to twice the math throughput of dense matrix units.
We also describe a simple workflow for training networks that both satisfy the 2:4 sparsity pattern requirements and maintain accuracy.
arXiv Detail & Related papers (2021-04-16T21:27:32Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - FTBNN: Rethinking Non-linearity for 1-bit CNNs and Going Beyond [23.5996182207431]
We show that binarized convolution process owns an increasing linearity towards the target of minimizing such error, which in turn hampers BNN's discriminative ability.
We re-investigate and tune proper non-linear modules to fix that contradiction, leading to a strong baseline which achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-10-19T08:11:48Z) - Binarizing MobileNet via Evolution-based Searching [66.94247681870125]
We propose a use of evolutionary search to facilitate the construction and training scheme when binarizing MobileNet.
Inspired by one-shot architecture search frameworks, we manipulate the idea of group convolution to design efficient 1-Bit Convolutional Neural Networks (CNNs)
Our objective is to come up with a tiny yet efficient binary neural architecture by exploring the best candidates of the group convolution.
arXiv Detail & Related papers (2020-05-13T13:25:51Z) - PERMDNN: Efficient Compressed DNN Architecture with Permuted Diagonal
Matrices [35.90103072918056]
Deep neural network (DNN) has emerged as the most important and popular artificial intelligent (AI) technique.
The growth of model size poses a key energy efficiency challenge for the underlying computing platform.
This paper proposes PermDNN, a novel approach to generate and execute hardware-friendly structured sparse DNN models.
arXiv Detail & Related papers (2020-04-23T02:26:40Z) - Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic
Circuits [99.59941892183454]
We propose Einsum Networks (EiNets), a novel implementation design for PCs.
At their core, EiNets combine a large number of arithmetic operations in a single monolithic einsum-operation.
We show that the implementation of Expectation-Maximization (EM) can be simplified for PCs, by leveraging automatic differentiation.
arXiv Detail & Related papers (2020-04-13T23:09:15Z) - An Image Enhancing Pattern-based Sparsity for Real-time Inference on
Mobile Devices [58.62801151916888]
We introduce a new sparsity dimension, namely pattern-based sparsity that comprises pattern and connectivity sparsity, and becoming both highly accurate and hardware friendly.
Our approach on the new pattern-based sparsity naturally fits into compiler optimization for highly efficient DNN execution on mobile platforms.
arXiv Detail & Related papers (2020-01-20T16:17:36Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.