1$\times$N Block Pattern for Network Sparsity
- URL: http://arxiv.org/abs/2105.14713v2
- Date: Tue, 1 Jun 2021 11:59:07 GMT
- Title: 1$\times$N Block Pattern for Network Sparsity
- Authors: Mingbao Lin, Yuchao Li, Yuxin Zhang, Bohong Chen, Fei Chao, Mengdi
Wang, Shen Li, Jun Yang, Rongrong Ji
- Abstract summary: We propose one novel concept of $1times N$ block sparsity pattern (block pruning) to break this limitation.
Our pattern obtains about 3.0% improvements over filter pruning in the top-1 accuracy of MobileNet-V2.
It also obtains 56.04ms inference savings on Cortex-A7 CPU over weight pruning.
- Score: 90.43191747596491
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Though network sparsity emerges as a promising direction to overcome the
drastically increasing size of neural networks, it remains an open problem to
concurrently maintain model accuracy as well as achieve significant speedups on
general CPUs. In this paper, we propose one novel concept of $1\times N$ block
sparsity pattern (block pruning) to break this limitation. In particular,
consecutive $N$ output kernels with the same input channel index are grouped
into one block, which serves as a basic pruning granularity of our pruning
pattern. Our $1 \times N$ sparsity pattern prunes these blocks considered
unimportant. We also provide a workflow of filter rearrangement that first
rearranges the weight matrix in the output channel dimension to derive more
influential blocks for accuracy improvements, and then applies similar
rearrangement to the next-layer weights in the input channel dimension to
ensure correct convolutional operations. Moreover, the output computation after
our $1 \times N$ block sparsity can be realized via a parallelized block-wise
vectorized operation, leading to significant speedups on general CPUs-based
platforms. The efficacy of our pruning pattern is proved with experiments on
ILSVRC-2012. For example, in the case of 50% sparsity and $N=4$, our pattern
obtains about 3.0% improvements over filter pruning in the top-1 accuracy of
MobileNet-V2. Meanwhile, it obtains 56.04ms inference savings on Cortex-A7 CPU
over weight pruning. Code is available at https://github.com/lmbxmu/1xN.
Related papers
- BBS: Bi-directional Bit-level Sparsity for Deep Learning Acceleration [9.092712730883887]
Bit-level sparsity methods skip ineffectual zero-bit operations and are typically applicable within bit-serial deep learning accelerators.
In this work, we improve the practicality and efficiency of bitlevel sparsity through a novel algorithmic bit-pruning, averaging, and compression method.
On the hardware side, we demonstrate the potential of BBS through BitVert, a bitserial architecture with an efficient PE design to accelerate DNNs with low overhead.
arXiv Detail & Related papers (2024-09-08T21:45:12Z) - PrivCirNet: Efficient Private Inference via Block Circulant Transformation [11.859511840002916]
Homomorphic encryption (HE)-based deep neural network (DNN) inference protects data and model privacy but suffers from significant computation overhead.
We propose PrivCirNet, a protocol/network co-optimization framework based on block circulant transformation.
PrivCirNet customizes the HE encoding algorithm that is fully compatible with the block circulant transformation.
arXiv Detail & Related papers (2024-05-23T13:44:48Z) - SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading
Acceleration [16.846777341261436]
The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources.
Recent work requires selecting and fine-tuning 1$times$N sparse weights based on dense pre-trained weights.
This paper proposes a novel emphtextbfSoft textbfUniform textbfBlock textbfPruning (SUBP) approach to train a uniform 1$times$N sparse structured network from scratch.
arXiv Detail & Related papers (2023-10-10T00:22:27Z) - Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module.
We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH)
In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z) - Fully $1\times1$ Convolutional Network for Lightweight Image
Super-Resolution [79.04007257606862]
Deep models have significant process on single image super-resolution (SISR) tasks, in particular large models with large kernel ($3times3$ or more)
$1times1$ convolutions bring substantial computational efficiency, but struggle with aggregating local spatial representations.
We propose a simple yet effective fully $1times1$ convolutional network, named Shift-Conv-based Network (SCNet)
arXiv Detail & Related papers (2023-07-30T06:24:03Z) - Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic
Programming [15.458305667190256]
We propose a novel depth compression algorithm which targets general convolution operations.
We achieve $1.41times$ speed-up with $0.11%p accuracy gain in MobileNetV2-1.0 on the ImageNet.
arXiv Detail & Related papers (2023-01-28T13:08:54Z) - The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich
Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$.
We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - Discrimination-aware Network Pruning for Deep Model Compression [79.44318503847136]
Existing pruning methods either train from scratch with sparsity constraints or minimize the reconstruction error between the feature maps of the pre-trained models and the compressed ones.
We propose a simple-yet-effective method called discrimination-aware channel pruning (DCP) to choose the channels that actually contribute to the discriminative power.
Experiments on both image classification and face recognition demonstrate the effectiveness of our methods.
arXiv Detail & Related papers (2020-01-04T07:07:41Z) - Pipelined Training with Stale Weights of Deep Convolutional Neural
Networks [0.1921787217122713]
We explore the impact of stale weights on the statistical efficiency and performance in a pipelined backpropagation scheme.
We show that when pipelining is limited to early layers in a network, training with stale weights converges and results in models with comparable inference accuracies.
We propose combining pipelined and non-pipelined training in a hybrid scheme to address this drop.
arXiv Detail & Related papers (2019-12-29T15:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.