Related papers: PDP: Parameter-free Differentiable Pruning is All You Need

PDP: Parameter-free Differentiable Pruning is All You Need

URL: http://arxiv.org/abs/2305.11203v3
Date: Fri, 17 Nov 2023 22:25:08 GMT
Title: PDP: Parameter-free Differentiable Pruning is All You Need
Authors: Minsik Cho, Saurabh Adya, Devang Naik
Abstract summary: We propose an efficient yet effective train-time pruning scheme,. differentiable Pruning ( PDP), which offers state-of-the-art qualities in model size, accuracy, and training cost. While differentiable, the simplicity and efficiency of PDP make it universal enough to deliver state-of-the-art random/structured/channel pruning results.
Score: 9.050217604438458
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: DNN pruning is a popular way to reduce the size of a model, improve the inference latency, and minimize the power consumption on DNN accelerators. However, existing approaches might be too complex, expensive or ineffective to apply to a variety of vision/language tasks, DNN architectures and to honor structured pruning constraints. In this paper, we propose an efficient yet effective train-time pruning scheme, Parameter-free Differentiable Pruning (PDP), which offers state-of-the-art qualities in model size, accuracy, and training cost. PDP uses a dynamic function of weights during training to generate soft pruning masks for the weights in a parameter-free manner for a given pruning target. While differentiable, the simplicity and efficiency of PDP make it universal enough to deliver state-of-the-art random/structured/channel pruning results on various vision and natural language tasks. For example, for MobileNet-v1, PDP can achieve 68.2% top-1 ImageNet1k accuracy at 86.6% sparsity, which is 1.7% higher accuracy than those from the state-of-the-art algorithms. Also, PDP yields over 83.1% accuracy on Multi-Genre Natural Language Inference with 90% sparsity for BERT, while the next best from the existing techniques shows 81.5% accuracy. In addition, PDP can be applied to structured pruning, such as N:M pruning and channel pruning. For 1:4 structured pruning of ResNet18, PDP improved the top-1 ImageNet1k accuracy by over 3.6% over the state-of-the-art. For channel pruning of ResNet50, PDP reduced the top-1 ImageNet1k accuracy by 0.6% from the state-of-the-art.

Related papers

MDP: Multidimensional Vision Model Pruning with Latency Constraint [17.256693658926405]
We introduce Multi-Dimensional Pruning (MDP), a novel paradigm that jointly optimize across a variety of pruning granularities. Extensive experiments demonstrate that MDP significantly outperforms previous methods, especially at high pruning ratios.
arXiv Detail & Related papers (2025-04-02T23:00:10Z)
PIP: Perturbation-based Iterative Pruning for Large Language Models [5.511065308044068]
We propose PIP (Perturbation-based Iterative Pruning), a novel double-view structured pruning method to optimize Large Language Models. Our experiments show that PIP reduces the parameter count by approximately 20% while retaining over 85% of the original model's accuracy.
arXiv Detail & Related papers (2025-01-25T17:10:50Z)
DRIVE: Dual Gradient-Based Rapid Iterative Pruning [2.209921757303168]
Modern deep neural networks (DNNs) consist of millions of parameters, necessitating high-performance computing during training and inference. Traditional pruning methods that are applied post-training focus on streamlining inference, but there are recent efforts to leverage sparsity early on by pruning before training. We present Dual Gradient-Based Rapid Iterative Pruning (DRIVE), which leverages dense training for initial epochs to counteract the randomness inherent at the inception.
arXiv Detail & Related papers (2024-04-01T20:44:28Z)
xMLP: Revolutionizing Private Inference with Exclusive Square Activation [27.092753578066294]
Private Inference (PI) enables deep neural networks (DNNs) to work on private data without leaking sensitive information. The use of non-linear activations such as ReLU in DNNs can lead to impractically high PI latency. We propose xMLP, a novel DNN architecture that uses square activations exclusively while maintaining parity in both accuracy and efficiency.
arXiv Detail & Related papers (2024-03-12T18:46:56Z)
Efficient Joint Optimization of Layer-Adaptive Weight Pruning in Deep Neural Networks [48.089501687522954]
We propose a novel layer-adaptive weight-pruning approach for Deep Neural Networks (DNNs) Our approach takes into account the collective influence of all layers to design a layer-adaptive pruning scheme. Our experiments demonstrate the superiority of our approach over existing methods on the ImageNet and CIFAR-10 datasets.
arXiv Detail & Related papers (2023-08-21T03:22:47Z)
GDP: Stabilized Neural Network Pruning via Gates with Differentiable Polarization [84.57695474130273]
Gate-based or importance-based pruning methods aim to remove channels whose importance is smallest. GDP can be plugged before convolutional layers without bells and whistles, to control the on-and-off of each channel. Experiments conducted over CIFAR-10 and ImageNet datasets show that the proposed GDP achieves the state-of-the-art performance.
arXiv Detail & Related papers (2021-09-06T03:17:10Z)
Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models. Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z)
Dynamic Probabilistic Pruning: A general framework for hardware-constrained pruning at different granularities [80.06422693778141]
We propose a flexible new pruning mechanism that facilitates pruning at different granularities (weights, kernels, filters/feature maps) We refer to this algorithm as Dynamic Probabilistic Pruning (DPP) We show that DPP achieves competitive compression rates and classification accuracy when pruning common deep learning models trained on different benchmark datasets for image classification.
arXiv Detail & Related papers (2021-05-26T17:01:52Z)
Hessian-Aware Pruning and Optimal Neural Implant [74.3282611517773]
Pruning is an effective method to reduce the memory footprint and FLOPs associated with neural network models. We introduce a new Hessian Aware Pruning method coupled with a Neural Implant approach that uses second-order sensitivity as a metric for structured pruning.
arXiv Detail & Related papers (2021-01-22T04:08:03Z)
Non-Parametric Adaptive Network Pruning [125.4414216272874]
We introduce non-parametric modeling to simplify the algorithm design. Inspired by the face recognition community, we use a message passing algorithm to obtain an adaptive number of exemplars. EPruner breaks the dependency on the training data in determining the "important" filters.
arXiv Detail & Related papers (2021-01-20T06:18:38Z)
Pre-defined Sparsity for Low-Complexity Convolutional Neural Networks [9.409651543514615]
This work introduces convolutional layers with pre-defined sparse 2D kernels that have support sets that repeat periodically within and across filters. Due to the efficient storage of our periodic sparse kernels, the parameter savings can translate into considerable improvements in energy efficiency.
arXiv Detail & Related papers (2020-01-29T07:10:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.