Related papers: 1$\times$N Block Pattern for Network Sparsity

1$\times$N Block Pattern for Network Sparsity

URL: http://arxiv.org/abs/2105.14713v2
Date: Tue, 1 Jun 2021 11:59:07 GMT
Title: 1$\times$N Block Pattern for Network Sparsity
Authors: Mingbao Lin, Yuchao Li, Yuxin Zhang, Bohong Chen, Fei Chao, Mengdi Wang, Shen Li, Jun Yang, Rongrong Ji
Abstract summary: We propose one novel concept of $1times N$ block sparsity pattern (block pruning) to break this limitation. Our pattern obtains about 3.0% improvements over filter pruning in the top-1 accuracy of MobileNet-V2. It also obtains 56.04ms inference savings on Cortex-A7 CPU over weight pruning.
Score: 90.43191747596491
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Though network sparsity emerges as a promising direction to overcome the drastically increasing size of neural networks, it remains an open problem to concurrently maintain model accuracy as well as achieve significant speedups on general CPUs. In this paper, we propose one novel concept of $1\times N$ block sparsity pattern (block pruning) to break this limitation. In particular, consecutive $N$ output kernels with the same input channel index are grouped into one block, which serves as a basic pruning granularity of our pruning pattern. Our $1 \times N$ sparsity pattern prunes these blocks considered unimportant. We also provide a workflow of filter rearrangement that first rearranges the weight matrix in the output channel dimension to derive more influential blocks for accuracy improvements, and then applies similar rearrangement to the next-layer weights in the input channel dimension to ensure correct convolutional operations. Moreover, the output computation after our $1 \times N$ block sparsity can be realized via a parallelized block-wise vectorized operation, leading to significant speedups on general CPUs-based platforms. The efficacy of our pruning pattern is proved with experiments on ILSVRC-2012. For example, in the case of 50% sparsity and $N=4$, our pattern obtains about 3.0% improvements over filter pruning in the top-1 accuracy of MobileNet-V2. Meanwhile, it obtains 56.04ms inference savings on Cortex-A7 CPU over weight pruning. Code is available at https://github.com/lmbxmu/1xN.

Related papers

An Uncertainty Principle for Linear Recurrent Neural Networks [54.13281679205581]
We build a linear filter of order $S$ that approximates the filter that looks $K$ time steps in the past. We fully characterize the problem by providing lower bounds of approximation, as well as explicit filters that achieve this lower bound up to constants. The optimal performance highlights an uncertainty principle: the filter has to average values around the $K$-th time step in the past with a range(width) that is proportional to $K/S$.
arXiv Detail & Related papers (2025-02-13T13:01:46Z)
BBS: Bi-directional Bit-level Sparsity for Deep Learning Acceleration [9.092712730883887]
Bit-level sparsity methods skip ineffectual zero-bit operations and are typically applicable within bit-serial deep learning accelerators. In this work, we improve the practicality and efficiency of bitlevel sparsity through a novel algorithmic bit-pruning, averaging, and compression method. On the hardware side, we demonstrate the potential of BBS through BitVert, a bitserial architecture with an efficient PE design to accelerate DNNs with low overhead.
arXiv Detail & Related papers (2024-09-08T21:45:12Z)
PrivCirNet: Efficient Private Inference via Block Circulant Transformation [11.859511840002916]
Homomorphic encryption (HE)-based deep neural network (DNN) inference protects data and model privacy but suffers from significant computation overhead. We propose PrivCirNet, a protocol/network co-optimization framework based on block circulant transformation. PrivCirNet customizes the HE encoding algorithm that is fully compatible with the block circulant transformation.
arXiv Detail & Related papers (2024-05-23T13:44:48Z)
SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading Acceleration [16.846777341261436]
The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. Recent work requires selecting and fine-tuning 1$times$N sparse weights based on dense pre-trained weights. This paper proposes a novel emphtextbfSoft textbfUniform textbfBlock textbfPruning (SUBP) approach to train a uniform 1$times$N sparse structured network from scratch.
arXiv Detail & Related papers (2023-10-10T00:22:27Z)
Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module. We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH) In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z)
Fully $1\times1$ Convolutional Network for Lightweight Image Super-Resolution [79.04007257606862]
Deep models have significant process on single image super-resolution (SISR) tasks, in particular large models with large kernel ($3times3$ or more) $1times1$ convolutions bring substantial computational efficiency, but struggle with aggregating local spatial representations. We propose a simple yet effective fully $1times1$ convolutional network, named Shift-Conv-based Network (SCNet)
arXiv Detail & Related papers (2023-07-30T06:24:03Z)
Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming [15.458305667190256]
We propose a novel depth compression algorithm which targets general convolution operations. We achieve $1.41times$ speed-up with $0.11%p accuracy gain in MobileNetV2-1.0 on the ImageNet.
arXiv Detail & Related papers (2023-01-28T13:08:54Z)
The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$. We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z)
Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments. In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z)
Discrimination-aware Network Pruning for Deep Model Compression [79.44318503847136]
Existing pruning methods either train from scratch with sparsity constraints or minimize the reconstruction error between the feature maps of the pre-trained models and the compressed ones. We propose a simple-yet-effective method called discrimination-aware channel pruning (DCP) to choose the channels that actually contribute to the discriminative power. Experiments on both image classification and face recognition demonstrate the effectiveness of our methods.
arXiv Detail & Related papers (2020-01-04T07:07:41Z)
Pipelined Training with Stale Weights of Deep Convolutional Neural Networks [0.1921787217122713]
We explore the impact of stale weights on the statistical efficiency and performance in a pipelined backpropagation scheme. We show that when pipelining is limited to early layers in a network, training with stale weights converges and results in models with comparable inference accuracies. We propose combining pipelined and non-pipelined training in a hybrid scheme to address this drop.
arXiv Detail & Related papers (2019-12-29T15:28:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.