Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition
- URL: http://arxiv.org/abs/2403.07953v2
- Date: Sun, 31 Mar 2024 23:47:47 GMT
- Title: Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition
- Authors: Geonhwa Jeong, Po-An Tsai, Abhimanyu R. Bambhaniya, Stephen W. Keckler, Tushar Krishna,
- Abstract summary: Exploiting sparsity in deep neural networks (DNNs) has been a promising area to meet the growing computation need of modern DNNs.
Structured sparse hardware support provides limited flexibility and requires extra model fine-tuning.
This paper proposes tensor approximation via structured decomposition (TASD) to bridge the gap between sparse DNN models and hardware.
- Score: 6.1810913678161405
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Exploiting sparsity in deep neural networks (DNNs) has been a promising area to meet the growing computation need of modern DNNs. However, in practice, sparse DNN acceleration still faces a key challenge. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparse hardware support recently, which provides limited flexibility and requires extra model fine-tuning. Moreover, any sparse model fine-tuned for certain structured sparse hardware cannot be accelerated by other structured hardware. To bridge the gap between sparse DNN models and hardware, this paper proposes tensor approximation via structured decomposition (TASD), which leverages the distributive property in linear algebra to turn any sparse tensor into a series of structured sparse tensors. Next, we develop a software framework, TASDER, to accelerate DNNs by searching layer-wise, high-quality structured decomposition for both weight and activation tensors so that they can be accelerated by any systems with structured sparse hardware support. Evaluation results show that, by exploiting prior structured sparse hardware baselines, our method can accelerate off-the-shelf dense and sparse DNNs without fine-tuning and improves energy-delay-product by up to 83% and 74% on average.
Related papers
- Dynamic Sparse Training of Diagonally Sparse Networks [15.13506569122892]
unstructured sparsity often fails to translate into practical speedups on modern hardware.<n>We propose DynaDiag, a novel structured sparse-to-sparse method that performs at par with unstructured sparsity.<n>With 90% linear layers in ViTs, we observe up to a 3.13x speedup in online inference without sacrificing model performance.
arXiv Detail & Related papers (2025-06-13T04:01:34Z) - SpikeX: Exploring Accelerator Architecture and Network-Hardware Co-Optimization for Sparse Spiking Neural Networks [3.758294848902233]
We propose a novel systolic-array SNN accelerator architecture, called SpikeX, to take on the challenges and opportunities stemming from unstructured sparsity.<n>SpikeX reduces memory access and increases data sharing and hardware utilization targeting computations spanning both time and space.
arXiv Detail & Related papers (2025-05-18T08:07:44Z) - Dynamic Sparsity Is Channel-Level Sparsity Learner [91.31071026340746]
Dynamic sparse training (DST) is a leading sparse training approach.
Channel-aware dynamic sparse (Chase) seamlessly translates the promise of unstructured dynamic sparsity to channel-level sparsity.
Our approach translates unstructured sparsity to channel-wise sparsity.
arXiv Detail & Related papers (2023-05-30T23:33:45Z) - Dynamic Sparse Training with Structured Sparsity [11.778353786208765]
Dynamic Sparse Training (DST) methods achieve state-of-the-art results in sparse neural network training.
We propose a sparse-to-sparse DST method, Structured RigL (SRigL), to learn a variant of fine-grained structured N:M sparsity.
We demonstrate a real-world acceleration of 3.4x/2.5x on CPU for online inference and 1.7x/13.0x on GPU for inference with a batch size of 256.
arXiv Detail & Related papers (2023-05-03T17:48:55Z) - FSCNN: A Fast Sparse Convolution Neural Network Inference System [31.474696818171953]
Convolution neural networks (CNNs) have achieved remarkable success, but typically accompany high computation cost and numerous redundant weight parameters.
To reduce the FLOPs, structure pruning is a popular approach to remove the entire hidden structures via introducing coarse-grained sparsity.
We present an efficient convolution neural network inference system to accelerate its forward pass by utilizing the fine-grained sparsity of compressed CNNs.
arXiv Detail & Related papers (2022-12-17T06:44:58Z) - DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware
Efficiency of Compact Neural Networks [29.46621102184345]
We propose a framework dubbed DepthShrinker to develop hardware-friendly compact networks.
Our framework delivers hardware-friendly compact networks that outperform both state-of-the-art efficient DNNs and compression techniques.
arXiv Detail & Related papers (2022-06-02T02:32:47Z) - Truncated tensor Schatten p-norm based approach for spatiotemporal
traffic data imputation with complicated missing patterns [77.34726150561087]
We introduce four complicated missing patterns, including missing and three fiber-like missing cases according to the mode-drivenn fibers.
Despite nonity of the objective function in our model, we derive the optimal solutions by integrating alternating data-mputation method of multipliers.
arXiv Detail & Related papers (2022-05-19T08:37:56Z) - Sub-bit Neural Networks: Learning to Compress and Accelerate Binary
Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs.
SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space.
Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - SparseDNN: Fast Sparse Deep Learning Inference on CPUs [1.6244541005112747]
We present SparseDNN, a sparse deep learning inference engine targeting CPUs.
We show that our sparse code generator can achieve significant speedups over state-of-the-art sparse and dense libraries.
arXiv Detail & Related papers (2021-01-20T03:27:35Z) - Block-term Tensor Neural Networks [29.442026567710435]
We show that block-term tensor layers (BT-layers) can be easily adapted to neural network models, such as CNNs and RNNs.
BT-layers in CNNs and RNNs can achieve a very large compression ratio on the number of parameters while preserving or improving the representation power of the original DNNs.
arXiv Detail & Related papers (2020-10-10T09:58:43Z) - Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network
Training [0.5219568203653523]
We develop a sparse DNN training accelerator that produces pruned models with the same accuracy as dense models without first training, then pruning, and finally retraining, a dense model.
Compared to training the equivalent unpruned models using a state-of-the-art DNN accelerator without sparse training support, Procrustes consumes up to 3.26$times$ less energy and offers up to 4$times$ speedup across a range of models, while pruning weights by an order of magnitude and maintaining unpruned accuracy.
arXiv Detail & Related papers (2020-09-23T07:39:55Z) - An Image Enhancing Pattern-based Sparsity for Real-time Inference on
Mobile Devices [58.62801151916888]
We introduce a new sparsity dimension, namely pattern-based sparsity that comprises pattern and connectivity sparsity, and becoming both highly accurate and hardware friendly.
Our approach on the new pattern-based sparsity naturally fits into compiler optimization for highly efficient DNN execution on mobile platforms.
arXiv Detail & Related papers (2020-01-20T16:17:36Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.