Dynamic Probabilistic Pruning: A general framework for
hardware-constrained pruning at different granularities
- URL: http://arxiv.org/abs/2105.12686v1
- Date: Wed, 26 May 2021 17:01:52 GMT
- Title: Dynamic Probabilistic Pruning: A general framework for
hardware-constrained pruning at different granularities
- Authors: Lizeth Gonzalez-Carabarin, Iris A.M. Huijben, Bastiaan S. Veeling,
Alexandre Schmid, Ruud J.G. van Sloun
- Abstract summary: We propose a flexible new pruning mechanism that facilitates pruning at different granularities (weights, kernels, filters/feature maps)
We refer to this algorithm as Dynamic Probabilistic Pruning (DPP)
We show that DPP achieves competitive compression rates and classification accuracy when pruning common deep learning models trained on different benchmark datasets for image classification.
- Score: 80.06422693778141
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unstructured neural network pruning algorithms have achieved impressive
compression rates. However, the resulting - typically irregular - sparse
matrices hamper efficient hardware implementations, leading to additional
memory usage and complex control logic that diminishes the benefits of
unstructured pruning. This has spurred structured coarse-grained pruning
solutions that prune entire filters or even layers, enabling efficient
implementation at the expense of reduced flexibility. Here we propose a
flexible new pruning mechanism that facilitates pruning at different
granularities (weights, kernels, filters/feature maps), while retaining
efficient memory organization (e.g. pruning exactly k-out-of-n weights for
every output neuron, or pruning exactly k-out-of-n kernels for every feature
map). We refer to this algorithm as Dynamic Probabilistic Pruning (DPP). DPP
leverages the Gumbel-softmax relaxation for differentiable k-out-of-n sampling,
facilitating end-to-end optimization. We show that DPP achieves competitive
compression rates and classification accuracy when pruning common deep learning
models trained on different benchmark datasets for image classification.
Relevantly, the non-magnitude-based nature of DPP allows for joint optimization
of pruning and weight quantization in order to even further compress the
network, which we show as well. Finally, we propose novel information theoretic
metrics that show the confidence and pruning diversity of pruning masks within
a layer.
Related papers
- MPruner: Optimizing Neural Network Size with CKA-Based Mutual Information Pruning [7.262751938473306]
Pruning is a well-established technique that reduces the size of neural networks while mathematically guaranteeing accuracy preservation.
We develop a new pruning algorithm, MPruner, that leverages mutual information through vector similarity.
MPruner achieved up to a 50% reduction in parameters and memory usage for CNN and transformer-based models, with minimal to no loss in accuracy.
arXiv Detail & Related papers (2024-08-24T05:54:47Z) - Less is KEN: a Universal and Simple Non-Parametric Pruning Algorithm for Large Language Models [1.5807079236265718]
KEN is a straightforward, universal and unstructured pruning algorithm based on Kernel Density Estimation (KDE)
Ken aims to construct optimized transformers by selectively preserving the most significant parameters while restoring others to their pre-training state.
Ken achieves equal or better performance than their original unpruned versions, with a minimum parameter reduction of 25%.
arXiv Detail & Related papers (2024-02-05T16:11:43Z) - Dynamic Structure Pruning for Compressing CNNs [13.73717878732162]
We introduce a novel structure pruning method, termed as dynamic structure pruning, to identify optimal pruning granularities for intra-channel pruning.
The experimental results show that dynamic structure pruning achieves state-of-the-art pruning performance and better realistic acceleration on a GPU compared with channel pruning.
arXiv Detail & Related papers (2023-03-17T02:38:53Z) - Interspace Pruning: Using Adaptive Filter Representations to Improve
Training of Sparse CNNs [69.3939291118954]
Unstructured pruning is well suited to reduce the memory footprint of convolutional neural networks (CNNs)
Standard unstructured pruning (SP) reduces the memory footprint of CNNs by setting filter elements to zero.
We introduce interspace pruning (IP), a general tool to improve existing pruning methods.
arXiv Detail & Related papers (2022-03-15T11:50:45Z) - Data-Efficient Structured Pruning via Submodular Optimization [32.574190896543705]
We propose a data-efficient structured pruning method based on submodular optimization.
We show that this selection problem is a weakly submodular problem, thus it can be provably approximated using an efficient greedy algorithm.
Our method is one of the few in the literature that uses only a limited-number of training data and no labels.
arXiv Detail & Related papers (2022-03-09T18:40:29Z) - Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via
GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer.
In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph.
Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - MLPruning: A Multilevel Structured Pruning Framework for
Transformer-based Models [78.45898846056303]
Pruning is an effective method to reduce the memory footprint and computational cost associated with large natural language processing models.
We develop a novel MultiLevel structured Pruning framework, which uses three different levels of structured pruning: head pruning, row pruning, and block-wise sparse pruning.
arXiv Detail & Related papers (2021-05-30T22:00:44Z) - DHP: Differentiable Meta Pruning via HyperNetworks [158.69345612783198]
This paper introduces a differentiable pruning method via hypernetworks for automatic network pruning.
Latent vectors control the output channels of the convolutional layers in the backbone network and act as a handle for the pruning of the layers.
Experiments are conducted on various networks for image classification, single image super-resolution, and denoising.
arXiv Detail & Related papers (2020-03-30T17:59:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.