Weight-dependent Gates for Network Pruning
- URL: http://arxiv.org/abs/2007.02066v4
- Date: Sat, 14 May 2022 11:40:51 GMT
- Title: Weight-dependent Gates for Network Pruning
- Authors: Yun Li, Zechun Liu, Weiqun Wu, Haotian Yao, Xiangyu Zhang, Chi Zhang,
Baoqun Yin
- Abstract summary: This paper argues that the pruning decision should depend on the convolutional weights, and thus proposes novel weight-dependent gates (W-Gates) to learn the information from filter weights and obtain binary gates to prune or keep the filters automatically.
We have demonstrated the effectiveness of the proposed method on ResNet34, ResNet50, and MobileNet V2, respectively achieving up to 1.33/1.28/1.1 higher Top-1 accuracy with lower hardware latency on ImageNet.
- Score: 24.795174721078528
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, a simple yet effective network pruning framework is proposed
to simultaneously address the problems of pruning indicator, pruning ratio, and
efficiency constraint. This paper argues that the pruning decision should
depend on the convolutional weights, and thus proposes novel weight-dependent
gates (W-Gates) to learn the information from filter weights and obtain binary
gates to prune or keep the filters automatically. To prune the network under
efficiency constraints, a switchable Efficiency Module is constructed to
predict the hardware latency or FLOPs of candidate pruned networks. Combined
with the proposed Efficiency Module, W-Gates can perform filter pruning in an
efficiency-aware manner and achieve a compact network with a better
accuracy-efficiency trade-off. We have demonstrated the effectiveness of the
proposed method on ResNet34, ResNet50, and MobileNet V2, respectively achieving
up to 1.33/1.28/1.1 higher Top-1 accuracy with lower hardware latency on
ImageNet. Compared with state-of-the-art methods, W-Gates also achieves
superior performance.
Related papers
- UniPTS: A Unified Framework for Proficient Post-Training Sparsity [67.16547529992928]
Post-training Sparsity (PTS) is a newly emerged avenue that chases efficient network sparsity with limited data in need.
In this paper, we attempt to reconcile this disparity by transposing three cardinal factors that profoundly alter the performance of conventional sparsity into the context of PTS.
Our framework, termed UniPTS, is validated to be much superior to existing PTS methods across extensive benchmarks.
arXiv Detail & Related papers (2024-05-29T06:53:18Z) - Efficient Modulation for Vision Networks [122.1051910402034]
We propose efficient modulation, a novel design for efficient vision networks.
We demonstrate that the modulation mechanism is particularly well suited for efficient networks.
Our network can accomplish better trade-offs between accuracy and efficiency.
arXiv Detail & Related papers (2024-03-29T03:48:35Z) - Layer-adaptive Structured Pruning Guided by Latency [7.193554978191659]
Structured pruning can simplify network architecture and improve inference speed.
We propose a global importance score SP-LAMP by deriving a global importance score LAMP from unstructured pruning to structured pruning.
Experimental results in ResNet56 on CIFAR10 demonstrate that our algorithm achieves lower latency compared to alternative approaches.
arXiv Detail & Related papers (2023-05-23T11:18:37Z) - Rewarded meta-pruning: Meta Learning with Rewards for Channel Pruning [19.978542231976636]
This paper proposes a novel method to reduce the parameters and FLOPs for computational efficiency in deep learning models.
We introduce accuracy and efficiency coefficients to control the trade-off between the accuracy of the network and its computing efficiency.
arXiv Detail & Related papers (2023-01-26T12:32:01Z) - Logic Shrinkage: Learned FPGA Netlist Sparsity for Efficient Neural
Network Inference [3.2296078260106174]
We propose the learned optimization of such LUT-based topologies, resulting in higher-efficiency designs.
Existing implementations of this class of architecture require the manual specification of the number of inputs per LUT, K.
We propose logic shrinkage, a fine-grained netlist pruning methodology enabling K to be automatically learned for every LUT in a neural network targeted for FPGA inference.
arXiv Detail & Related papers (2021-12-04T14:23:24Z) - CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization [61.71504948770445]
We propose a novel channel pruning method via Class-Aware Trace Ratio Optimization (CATRO) to reduce the computational burden and accelerate the model inference.
We show that CATRO achieves higher accuracy with similar cost or lower cost with similar accuracy than other state-of-the-art channel pruning algorithms.
Because of its class-aware property, CATRO is suitable to prune efficient networks adaptively for various classification subtasks, enhancing handy deployment and usage of deep networks in real-world applications.
arXiv Detail & Related papers (2021-10-21T06:26:31Z) - HANT: Hardware-Aware Network Transformation [82.54824188745887]
We propose hardware-aware network transformation (HANT)
HANT replaces inefficient operations with more efficient alternatives using a neural architecture search like approach.
Our results on accelerating the EfficientNet family show that HANT can accelerate them by up to 3.6x with 0.4% drop in the top-1 accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2021-07-12T18:46:34Z) - Connectivity Matters: Neural Network Pruning Through the Lens of
Effective Sparsity [0.0]
Neural network pruning is a fruitful area of research with surging interest in high sparsity regimes.
We show that effective compression of a randomly pruned LeNet-300-100 can be orders of magnitude larger than its direct counterpart.
We develop a low-cost extension to most pruning algorithms to aim for effective, rather than direct, sparsity.
arXiv Detail & Related papers (2021-07-05T22:36:57Z) - Dynamic Slimmable Network [105.74546828182834]
We develop a dynamic network slimming regime named Dynamic Slimmable Network (DS-Net)
Our DS-Net is empowered with the ability of dynamic inference by the proposed double-headed dynamic gate.
It consistently outperforms its static counterparts as well as state-of-the-art static and dynamic model compression methods.
arXiv Detail & Related papers (2021-03-24T15:25:20Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.