PDP: Parameter-free Differentiable Pruning is All You Need
- URL: http://arxiv.org/abs/2305.11203v3
- Date: Fri, 17 Nov 2023 22:25:08 GMT
- Title: PDP: Parameter-free Differentiable Pruning is All You Need
- Authors: Minsik Cho, Saurabh Adya, Devang Naik
- Abstract summary: We propose an efficient yet effective train-time pruning scheme,.
differentiable Pruning ( PDP), which offers state-of-the-art qualities in model size, accuracy, and training cost.
While differentiable, the simplicity and efficiency of PDP make it universal enough to deliver state-of-the-art random/structured/channel pruning results.
- Score: 9.050217604438458
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: DNN pruning is a popular way to reduce the size of a model, improve the
inference latency, and minimize the power consumption on DNN accelerators.
However, existing approaches might be too complex, expensive or ineffective to
apply to a variety of vision/language tasks, DNN architectures and to honor
structured pruning constraints. In this paper, we propose an efficient yet
effective train-time pruning scheme, Parameter-free Differentiable Pruning
(PDP), which offers state-of-the-art qualities in model size, accuracy, and
training cost. PDP uses a dynamic function of weights during training to
generate soft pruning masks for the weights in a parameter-free manner for a
given pruning target. While differentiable, the simplicity and efficiency of
PDP make it universal enough to deliver state-of-the-art
random/structured/channel pruning results on various vision and natural
language tasks. For example, for MobileNet-v1, PDP can achieve 68.2% top-1
ImageNet1k accuracy at 86.6% sparsity, which is 1.7% higher accuracy than those
from the state-of-the-art algorithms. Also, PDP yields over 83.1% accuracy on
Multi-Genre Natural Language Inference with 90% sparsity for BERT, while the
next best from the existing techniques shows 81.5% accuracy. In addition, PDP
can be applied to structured pruning, such as N:M pruning and channel pruning.
For 1:4 structured pruning of ResNet18, PDP improved the top-1 ImageNet1k
accuracy by over 3.6% over the state-of-the-art. For channel pruning of
ResNet50, PDP reduced the top-1 ImageNet1k accuracy by 0.6% from the
state-of-the-art.
Related papers
- MDP: Multidimensional Vision Model Pruning with Latency Constraint [17.256693658926405]
We introduce Multi-Dimensional Pruning (MDP), a novel paradigm that jointly optimize across a variety of pruning granularities.
Extensive experiments demonstrate that MDP significantly outperforms previous methods, especially at high pruning ratios.
arXiv Detail & Related papers (2025-04-02T23:00:10Z) - PIP: Perturbation-based Iterative Pruning for Large Language Models [5.511065308044068]
We propose PIP (Perturbation-based Iterative Pruning), a novel double-view structured pruning method to optimize Large Language Models.
Our experiments show that PIP reduces the parameter count by approximately 20% while retaining over 85% of the original model's accuracy.
arXiv Detail & Related papers (2025-01-25T17:10:50Z) - DRIVE: Dual Gradient-Based Rapid Iterative Pruning [2.209921757303168]
Modern deep neural networks (DNNs) consist of millions of parameters, necessitating high-performance computing during training and inference.
Traditional pruning methods that are applied post-training focus on streamlining inference, but there are recent efforts to leverage sparsity early on by pruning before training.
We present Dual Gradient-Based Rapid Iterative Pruning (DRIVE), which leverages dense training for initial epochs to counteract the randomness inherent at the inception.
arXiv Detail & Related papers (2024-04-01T20:44:28Z) - xMLP: Revolutionizing Private Inference with Exclusive Square Activation [27.092753578066294]
Private Inference (PI) enables deep neural networks (DNNs) to work on private data without leaking sensitive information.
The use of non-linear activations such as ReLU in DNNs can lead to impractically high PI latency.
We propose xMLP, a novel DNN architecture that uses square activations exclusively while maintaining parity in both accuracy and efficiency.
arXiv Detail & Related papers (2024-03-12T18:46:56Z) - Efficient Joint Optimization of Layer-Adaptive Weight Pruning in Deep
Neural Networks [48.089501687522954]
We propose a novel layer-adaptive weight-pruning approach for Deep Neural Networks (DNNs)
Our approach takes into account the collective influence of all layers to design a layer-adaptive pruning scheme.
Our experiments demonstrate the superiority of our approach over existing methods on the ImageNet and CIFAR-10 datasets.
arXiv Detail & Related papers (2023-08-21T03:22:47Z) - GDP: Stabilized Neural Network Pruning via Gates with Differentiable
Polarization [84.57695474130273]
Gate-based or importance-based pruning methods aim to remove channels whose importance is smallest.
GDP can be plugged before convolutional layers without bells and whistles, to control the on-and-off of each channel.
Experiments conducted over CIFAR-10 and ImageNet datasets show that the proposed GDP achieves the state-of-the-art performance.
arXiv Detail & Related papers (2021-09-06T03:17:10Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - Dynamic Probabilistic Pruning: A general framework for
hardware-constrained pruning at different granularities [80.06422693778141]
We propose a flexible new pruning mechanism that facilitates pruning at different granularities (weights, kernels, filters/feature maps)
We refer to this algorithm as Dynamic Probabilistic Pruning (DPP)
We show that DPP achieves competitive compression rates and classification accuracy when pruning common deep learning models trained on different benchmark datasets for image classification.
arXiv Detail & Related papers (2021-05-26T17:01:52Z) - Hessian-Aware Pruning and Optimal Neural Implant [74.3282611517773]
Pruning is an effective method to reduce the memory footprint and FLOPs associated with neural network models.
We introduce a new Hessian Aware Pruning method coupled with a Neural Implant approach that uses second-order sensitivity as a metric for structured pruning.
arXiv Detail & Related papers (2021-01-22T04:08:03Z) - Non-Parametric Adaptive Network Pruning [125.4414216272874]
We introduce non-parametric modeling to simplify the algorithm design.
Inspired by the face recognition community, we use a message passing algorithm to obtain an adaptive number of exemplars.
EPruner breaks the dependency on the training data in determining the "important" filters.
arXiv Detail & Related papers (2021-01-20T06:18:38Z) - Pre-defined Sparsity for Low-Complexity Convolutional Neural Networks [9.409651543514615]
This work introduces convolutional layers with pre-defined sparse 2D kernels that have support sets that repeat periodically within and across filters.
Due to the efficient storage of our periodic sparse kernels, the parameter savings can translate into considerable improvements in energy efficiency.
arXiv Detail & Related papers (2020-01-29T07:10:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.