Data-Efficient Structured Pruning via Submodular Optimization
- URL: http://arxiv.org/abs/2203.04940v1
- Date: Wed, 9 Mar 2022 18:40:29 GMT
- Title: Data-Efficient Structured Pruning via Submodular Optimization
- Authors: Marwa El Halabi, Suraj Srinivas, Simon Lacoste-Julien
- Abstract summary: We propose a data-efficient structured pruning method based on submodular optimization.
We show that this selection problem is a weakly submodular problem, thus it can be provably approximated using an efficient greedy algorithm.
Our method is one of the few in the literature that uses only a limited-number of training data and no labels.
- Score: 32.574190896543705
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Structured pruning is an effective approach for compressing large pre-trained
neural networks without significantly affecting their performance, which
involves removing redundant regular regions of weights. However, current
structured pruning methods are highly empirical in nature, do not provide any
theoretical guarantees, and often require fine-tuning, which makes them
inapplicable in the limited-data regime. We propose a principled data-efficient
structured pruning method based on submodular optimization. In particular, for
a given layer, we select neurons/channels to prune and corresponding new
weights for the next layer, that minimize the change in the next layer's input
induced by pruning. We show that this selection problem is a weakly submodular
maximization problem, thus it can be provably approximated using an efficient
greedy algorithm. Our method is one of the few in the literature that uses only
a limited-number of training data and no labels. Our experimental results
demonstrate that our method outperforms popular baseline methods in various
one-shot pruning settings.
Related papers
- Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient [57.9629676017527]
We propose an optimization-based structural pruning on Large-Language Models.
We learn the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model.
Our method operates for 2.7 hours with around 35GB memory for the 13B models on a single A100 GPU.
arXiv Detail & Related papers (2024-06-15T09:31:03Z) - Effective Layer Pruning Through Similarity Metric Perspective [0.0]
Deep neural networks have been the predominant paradigm in machine learning for solving cognitive tasks.
Pruning structures from these models is a straightforward approach to reducing network complexity.
Layer pruning often hurts the network predictive ability (i.e., accuracy) at high compression rates.
This work introduces an effective layer-pruning strategy that meets all underlying properties pursued by pruning methods.
arXiv Detail & Related papers (2024-05-27T11:54:51Z) - A Unified Framework for Soft Threshold Pruning [27.853698217792456]
We reformulate soft threshold pruning as an implicit optimization problem solved using the Iterative Shrinkage-Thresholding Algorithm (ISTA)
We derive an optimal threshold scheduler through an in-depth study of threshold scheduling based on our framework.
In principle, the derived pruning algorithm could sparsify any mathematical model trained via SGD.
arXiv Detail & Related papers (2023-02-25T08:16:14Z) - Low-rank Tensor Decomposition for Compression of Convolutional Neural
Networks Using Funnel Regularization [1.8579693774597708]
We propose a model reduction method to compress the pre-trained networks using low-rank tensor decomposition.
A new regularization method, called funnel function, is proposed to suppress the unimportant factors during the compression.
For ResNet18 with ImageNet2012, our reduced model can reach more than twi times speed up in terms of GMAC with merely 0.7% Top-1 accuracy drop.
arXiv Detail & Related papers (2021-12-07T13:41:51Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - MLPruning: A Multilevel Structured Pruning Framework for
Transformer-based Models [78.45898846056303]
Pruning is an effective method to reduce the memory footprint and computational cost associated with large natural language processing models.
We develop a novel MultiLevel structured Pruning framework, which uses three different levels of structured pruning: head pruning, row pruning, and block-wise sparse pruning.
arXiv Detail & Related papers (2021-05-30T22:00:44Z) - Dynamic Probabilistic Pruning: A general framework for
hardware-constrained pruning at different granularities [80.06422693778141]
We propose a flexible new pruning mechanism that facilitates pruning at different granularities (weights, kernels, filters/feature maps)
We refer to this algorithm as Dynamic Probabilistic Pruning (DPP)
We show that DPP achieves competitive compression rates and classification accuracy when pruning common deep learning models trained on different benchmark datasets for image classification.
arXiv Detail & Related papers (2021-05-26T17:01:52Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - Towards Optimal Filter Pruning with Balanced Performance and Pruning
Speed [17.115185960327665]
We propose a balanced filter pruning method for both performance and pruning speed.
Our method is able to prune a layer with approximate layer-wise optimal pruning rate at preset loss variation.
The proposed pruning method is widely applicable to common architectures and does not involve any additional training except the final fine-tuning.
arXiv Detail & Related papers (2020-10-14T06:17:09Z) - Lookahead: A Far-Sighted Alternative of Magnitude-based Pruning [83.99191569112682]
Magnitude-based pruning is one of the simplest methods for pruning neural networks.
We develop a simple pruning method, coined lookahead pruning, by extending the single layer optimization to a multi-layer optimization.
Our experimental results demonstrate that the proposed method consistently outperforms magnitude-based pruning on various networks.
arXiv Detail & Related papers (2020-02-12T05:38:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.