Related papers: Weight-dependent Gates for Network Pruning

Weight-dependent Gates for Network Pruning

URL: http://arxiv.org/abs/2007.02066v4
Date: Sat, 14 May 2022 11:40:51 GMT
Title: Weight-dependent Gates for Network Pruning
Authors: Yun Li, Zechun Liu, Weiqun Wu, Haotian Yao, Xiangyu Zhang, Chi Zhang, Baoqun Yin
Abstract summary: This paper argues that the pruning decision should depend on the convolutional weights, and thus proposes novel weight-dependent gates (W-Gates) to learn the information from filter weights and obtain binary gates to prune or keep the filters automatically. We have demonstrated the effectiveness of the proposed method on ResNet34, ResNet50, and MobileNet V2, respectively achieving up to 1.33/1.28/1.1 higher Top-1 accuracy with lower hardware latency on ImageNet.
Score: 24.795174721078528
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, a simple yet effective network pruning framework is proposed to simultaneously address the problems of pruning indicator, pruning ratio, and efficiency constraint. This paper argues that the pruning decision should depend on the convolutional weights, and thus proposes novel weight-dependent gates (W-Gates) to learn the information from filter weights and obtain binary gates to prune or keep the filters automatically. To prune the network under efficiency constraints, a switchable Efficiency Module is constructed to predict the hardware latency or FLOPs of candidate pruned networks. Combined with the proposed Efficiency Module, W-Gates can perform filter pruning in an efficiency-aware manner and achieve a compact network with a better accuracy-efficiency trade-off. We have demonstrated the effectiveness of the proposed method on ResNet34, ResNet50, and MobileNet V2, respectively achieving up to 1.33/1.28/1.1 higher Top-1 accuracy with lower hardware latency on ImageNet. Compared with state-of-the-art methods, W-Gates also achieves superior performance.

Related papers

Explainability-Inspired Layer-Wise Pruning of Deep Neural Networks for Efficient Object Detection [3.317338104573978]
We present an explainability-inspired, layer-wise pruning framework tailored for efficient object detection.<n>We conduct experiments across diverse object detection architectures, including ResNet-50, MobileNetV2, ShuffleNetV2, Faster R-CNN, RetinaNet, and YOLOv8.<n>Results show that the proposed attribution-inspired pruning consistently identifies different layers as least important compared to L1-norm-based methods.
arXiv Detail & Related papers (2026-02-15T08:07:19Z)
UniPTS: A Unified Framework for Proficient Post-Training Sparsity [67.16547529992928]
Post-training Sparsity (PTS) is a newly emerged avenue that chases efficient network sparsity with limited data in need. In this paper, we attempt to reconcile this disparity by transposing three cardinal factors that profoundly alter the performance of conventional sparsity into the context of PTS. Our framework, termed UniPTS, is validated to be much superior to existing PTS methods across extensive benchmarks.
arXiv Detail & Related papers (2024-05-29T06:53:18Z)
Efficient Modulation for Vision Networks [122.1051910402034]
We propose efficient modulation, a novel design for efficient vision networks. We demonstrate that the modulation mechanism is particularly well suited for efficient networks. Our network can accomplish better trade-offs between accuracy and efficiency.
arXiv Detail & Related papers (2024-03-29T03:48:35Z)
PLUM: Improving Inference Efficiency By Leveraging Repetition-Sparsity Trade-Off [2.326200609038491]
Quantization and sparsity are key techniques that translate to repetition and sparsity within tensors at the hardware-software interface.<n>This paper introduces the concept of repetition-sparsity trade-off that helps explain computational efficiency during inference.<n>We propose PLUM, a unified co-design framework that integrates inference systems and quantization to leverage the repetition-sparsity trade-off.
arXiv Detail & Related papers (2023-12-04T02:33:53Z)
Layer-adaptive Structured Pruning Guided by Latency [7.193554978191659]
Structured pruning can simplify network architecture and improve inference speed. We propose a global importance score SP-LAMP by deriving a global importance score LAMP from unstructured pruning to structured pruning. Experimental results in ResNet56 on CIFAR10 demonstrate that our algorithm achieves lower latency compared to alternative approaches.
arXiv Detail & Related papers (2023-05-23T11:18:37Z)
Rewarded meta-pruning: Meta Learning with Rewards for Channel Pruning [19.978542231976636]
This paper proposes a novel method to reduce the parameters and FLOPs for computational efficiency in deep learning models. We introduce accuracy and efficiency coefficients to control the trade-off between the accuracy of the network and its computing efficiency.
arXiv Detail & Related papers (2023-01-26T12:32:01Z)
Logic Shrinkage: Learned FPGA Netlist Sparsity for Efficient Neural Network Inference [3.2296078260106174]
We propose the learned optimization of such LUT-based topologies, resulting in higher-efficiency designs. Existing implementations of this class of architecture require the manual specification of the number of inputs per LUT, K. We propose logic shrinkage, a fine-grained netlist pruning methodology enabling K to be automatically learned for every LUT in a neural network targeted for FPGA inference.
arXiv Detail & Related papers (2021-12-04T14:23:24Z)
CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization [61.71504948770445]
We propose a novel channel pruning method via Class-Aware Trace Ratio Optimization (CATRO) to reduce the computational burden and accelerate the model inference. We show that CATRO achieves higher accuracy with similar cost or lower cost with similar accuracy than other state-of-the-art channel pruning algorithms. Because of its class-aware property, CATRO is suitable to prune efficient networks adaptively for various classification subtasks, enhancing handy deployment and usage of deep networks in real-world applications.
arXiv Detail & Related papers (2021-10-21T06:26:31Z)
HANT: Hardware-Aware Network Transformation [82.54824188745887]
We propose hardware-aware network transformation (HANT) HANT replaces inefficient operations with more efficient alternatives using a neural architecture search like approach. Our results on accelerating the EfficientNet family show that HANT can accelerate them by up to 3.6x with 0.4% drop in the top-1 accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2021-07-12T18:46:34Z)
Connectivity Matters: Neural Network Pruning Through the Lens of Effective Sparsity [0.0]
Neural network pruning is a fruitful area of research with surging interest in high sparsity regimes. We show that effective compression of a randomly pruned LeNet-300-100 can be orders of magnitude larger than its direct counterpart. We develop a low-cost extension to most pruning algorithms to aim for effective, rather than direct, sparsity.
arXiv Detail & Related papers (2021-07-05T22:36:57Z)
Dynamic Slimmable Network [105.74546828182834]
We develop a dynamic network slimming regime named Dynamic Slimmable Network (DS-Net) Our DS-Net is empowered with the ability of dynamic inference by the proposed double-headed dynamic gate. It consistently outperforms its static counterparts as well as state-of-the-art static and dynamic model compression methods.
arXiv Detail & Related papers (2021-03-24T15:25:20Z)
Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks. The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.