SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading
Acceleration
- URL: http://arxiv.org/abs/2310.06218v1
- Date: Tue, 10 Oct 2023 00:22:27 GMT
- Title: SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading
Acceleration
- Authors: Jingyang Xiang and Siqi Li and Jun Chen and Shipeng Bai and Yukai Ma
and Guang Dai and Yong Liu
- Abstract summary: The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources.
Recent work requires selecting and fine-tuning 1$times$N sparse weights based on dense pre-trained weights.
This paper proposes a novel emphtextbfSoft textbfUniform textbfBlock textbfPruning (SUBP) approach to train a uniform 1$times$N sparse structured network from scratch.
- Score: 16.846777341261436
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The study of sparsity in Convolutional Neural Networks (CNNs) has become
widespread to compress and accelerate models in environments with limited
resources. By constraining N consecutive weights along the output channel to be
group-wise non-zero, the recent network with 1$\times$N sparsity has received
tremendous popularity for its three outstanding advantages: 1) A large amount
of storage space saving by a \emph{Block Sparse Row} matrix. 2) Excellent
performance at a high sparsity. 3) Significant speedups on CPUs with Advanced
Vector Extensions. Recent work requires selecting and fine-tuning 1$\times$N
sparse weights based on dense pre-trained weights, leading to the problems such
as expensive training cost and memory access, sub-optimal model quality, as
well as unbalanced workload across threads (different sparsity across output
channels). To overcome them, this paper proposes a novel \emph{\textbf{S}oft
\textbf{U}niform \textbf{B}lock \textbf{P}runing} (SUBP) approach to train a
uniform 1$\times$N sparse structured network from scratch. Specifically, our
approach tends to repeatedly allow pruned blocks to regrow to the network based
on block angular redundancy and importance sampling in a uniform manner
throughout the training process. It not only makes the model less dependent on
pre-training, reduces the model redundancy and the risk of pruning the
important blocks permanently but also achieves balanced workload. Empirically,
on ImageNet, comprehensive experiments across various CNN architectures show
that our SUBP consistently outperforms existing 1$\times$N and structured
sparsity methods based on pre-trained models or training from scratch. Source
codes and models are available at \url{https://github.com/JingyangXiang/SUBP}.
Related papers
- Fully $1\times1$ Convolutional Network for Lightweight Image
Super-Resolution [79.04007257606862]
Deep models have significant process on single image super-resolution (SISR) tasks, in particular large models with large kernel ($3times3$ or more)
$1times1$ convolutions bring substantial computational efficiency, but struggle with aggregating local spatial representations.
We propose a simple yet effective fully $1times1$ convolutional network, named Shift-Conv-based Network (SCNet)
arXiv Detail & Related papers (2023-07-30T06:24:03Z) - Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints [59.39280540478479]
We propose sparse upcycling -- a simple way to reuse sunk training costs by initializing a sparsely activated Mixture-of-Experts model from a dense checkpoint.
We show that sparsely upcycled T5 Base, Large, and XL language models and Vision Transformer Base and Large models, respectively, significantly outperform their dense counterparts on SuperGLUE and ImageNet.
arXiv Detail & Related papers (2022-12-09T18:57:37Z) - Sparse Random Networks for Communication-Efficient Federated Learning [23.614934319624826]
One main challenge in federated learning is the large communication cost of exchanging weight updates from clients to the server at each round.
We propose a radically different approach that does not update the weights at all.
Instead, our method freezes the weights at their initial emphrandom values and learns how to sparsify the random network for the best performance.
arXiv Detail & Related papers (2022-09-30T09:11:09Z) - Not All Models Are Equal: Predicting Model Transferability in a
Self-challenging Fisher Space [51.62131362670815]
This paper addresses the problem of ranking the pre-trained deep neural networks and screening the most transferable ones for downstream tasks.
It proposes a new transferability metric called textbfSelf-challenging textbfFisher textbfDiscriminant textbfAnalysis (textbfSFDA)
arXiv Detail & Related papers (2022-07-07T01:33:25Z) - Superposing Many Tickets into One: A Performance Booster for Sparse
Neural Network Training [32.30355584300427]
We present a novel sparse training approach, termed textbfSup-tickets, which can satisfy two desiderata concurrently in a single sparse-to-sparse training process.
Across various modern architectures on CIFAR-10/100 and ImageNet, we show that Sup-tickets integrates seamlessly with the existing sparse training methods.
arXiv Detail & Related papers (2022-05-30T16:01:32Z) - FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training
with Dynamic Sparsity [74.58777701536668]
We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin.
We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
arXiv Detail & Related papers (2021-06-28T10:48:20Z) - 1$\times$N Block Pattern for Network Sparsity [90.43191747596491]
We propose one novel concept of $1times N$ block sparsity pattern (block pruning) to break this limitation.
Our pattern obtains about 3.0% improvements over filter pruning in the top-1 accuracy of MobileNet-V2.
It also obtains 56.04ms inference savings on Cortex-A7 CPU over weight pruning.
arXiv Detail & Related papers (2021-05-31T05:50:33Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network
Training [0.5219568203653523]
We develop a sparse DNN training accelerator that produces pruned models with the same accuracy as dense models without first training, then pruning, and finally retraining, a dense model.
Compared to training the equivalent unpruned models using a state-of-the-art DNN accelerator without sparse training support, Procrustes consumes up to 3.26$times$ less energy and offers up to 4$times$ speedup across a range of models, while pruning weights by an order of magnitude and maintaining unpruned accuracy.
arXiv Detail & Related papers (2020-09-23T07:39:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.