Structured Model Pruning of Convolutional Networks on Tensor Processing
Units
- URL: http://arxiv.org/abs/2107.04191v1
- Date: Fri, 9 Jul 2021 03:41:31 GMT
- Title: Structured Model Pruning of Convolutional Networks on Tensor Processing
Units
- Authors: Kongtao Chen, Ken Franko, Ruoxin Sang
- Abstract summary: Structured model pruning is a promising approach to alleviate these requirements.
We measure the accuracy-efficiency trade-off for various structured model pruning methods and datasets.
We show that structured model pruning can significantly improve model memory usage and speed on TPUs without losing accuracy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The deployment of convolutional neural networks is often hindered by high
computational and storage requirements. Structured model pruning is a promising
approach to alleviate these requirements. Using the VGG-16 model as an example,
we measure the accuracy-efficiency trade-off for various structured model
pruning methods and datasets (CIFAR-10 and ImageNet) on Tensor Processing Units
(TPUs). To measure the actual performance of models, we develop a structured
model pruning library for TensorFlow2 to modify models in place (instead of
adding mask layers). We show that structured model pruning can significantly
improve model memory usage and speed on TPUs without losing accuracy,
especially for small datasets (e.g., CIFAR-10).
Related papers
- RL-Pruner: Structured Pruning Using Reinforcement Learning for CNN Compression and Acceleration [0.0]
We propose RL-Pruner, which uses reinforcement learning to learn the optimal pruning distribution.
RL-Pruner can automatically extract dependencies between filters in the input model and perform pruning, without requiring model-specific pruning implementations.
arXiv Detail & Related papers (2024-11-10T13:35:10Z) - Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling.
Our research explores task-specific model pruning to inform decisions about designing SMoE architectures.
We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z) - CRISP: Hybrid Structured Sparsity for Class-aware Model Pruning [4.775684973625185]
Machine learning pipelines often train a universal model to achieve accuracy across a broad range of classes.
This disparity provides an opportunity to enhance computational efficiency by tailoring models to focus on user-specific classes.
We propose CRISP, a novel pruning framework that combines fine-grained N:M structured sparsity and coarse-grained block sparsity.
Our pruning strategy is guided by a gradient-based class-aware saliency score, allowing us to retain weights crucial for user-specific classes.
arXiv Detail & Related papers (2023-11-24T04:16:32Z) - Re-parameterizing Your Optimizers rather than Architectures [119.08740698936633]
We propose a novel paradigm of incorporating model-specific prior knowledge into Structurals and using them to train generic (simple) models.
As an implementation, we propose a novel methodology to add prior knowledge by modifying the gradients according to a set of model-specific hyper- parameters.
For a simple model trained with a Repr, we focus on a VGG-style plain model and showcase that such a simple model trained with a Repr, which is referred to as Rep-VGG, performs on par with the recent well-designed models.
arXiv Detail & Related papers (2022-05-30T16:55:59Z) - Load-balanced Gather-scatter Patterns for Sparse Deep Neural Networks [20.374784902476318]
Pruning, as a method to introduce zeros to model weights, has shown to be an effective method to provide good trade-offs between model accuracy and computation efficiency.
Some modern processors are equipped with fast on-chip scratchpad memories and gather/scatter engines that perform indirect load and store operations on such memories.
In this work, we propose a set of novel sparse patterns, named gather-scatter (GS) patterns, to utilize the scratchpad memories and gather/scatter engines to speed up neural network inferences.
arXiv Detail & Related papers (2021-12-20T22:55:45Z) - Sparse Flows: Pruning Continuous-depth Models [107.98191032466544]
We show that pruning improves generalization for neural ODEs in generative modeling.
We also show that pruning finds minimal and efficient neural ODE representations with up to 98% less parameters compared to the original network, without loss of accuracy.
arXiv Detail & Related papers (2021-06-24T01:40:17Z) - DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator
Search [55.164053971213576]
convolutional neural network has achieved great success in fulfilling computer vision tasks despite large computation overhead.
Structured (channel) pruning is usually applied to reduce the model redundancy while preserving the network structure.
Existing structured pruning methods require hand-crafted rules which may lead to tremendous pruning space.
arXiv Detail & Related papers (2020-11-04T07:43:01Z) - A Gradient Flow Framework For Analyzing Network Pruning [11.247894240593693]
Recent network pruning methods focus on pruning models early-on in training.
We develop a general framework that uses gradient flow to unify importance measures through the norm of model parameters.
We validate our claims on several VGG-13, MobileNet-V1, and ResNet-56 models trained on CIFAR-10/CIFAR-100.
arXiv Detail & Related papers (2020-09-24T17:37:32Z) - Dynamic Model Pruning with Feedback [64.019079257231]
We propose a novel model compression method that generates a sparse trained model without additional overhead.
We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models.
arXiv Detail & Related papers (2020-06-12T15:07:08Z) - Normalizing Flows with Multi-Scale Autoregressive Priors [131.895570212956]
We introduce channel-wise dependencies in their latent space through multi-scale autoregressive priors (mAR)
Our mAR prior for models with split coupling flow layers (mAR-SCF) can better capture dependencies in complex multimodal data.
We show that mAR-SCF allows for improved image generation quality, with gains in FID and Inception scores compared to state-of-the-art flow-based models.
arXiv Detail & Related papers (2020-04-08T09:07:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.