Comb, Prune, Distill: Towards Unified Pruning for Vision Model Compression
- URL: http://arxiv.org/abs/2408.03046v1
- Date: Tue, 6 Aug 2024 09:02:31 GMT
- Title: Comb, Prune, Distill: Towards Unified Pruning for Vision Model Compression
- Authors: Jonas Schmitt, Ruiping Liu, Junwei Zheng, Jiaming Zhang, Rainer Stiefelhagen,
- Abstract summary: We propose a novel unified pruning framework Comb, Prune, Distill (CPD) to address both model-agnostic and task-agnostic concerns simultaneously.
Our framework employs a combing step to resolve hierarchical layer-wise dependency issues, enabling architecture independence.
In image classification we achieve a speedup of up to x4.3 with a accuracy loss of 1.8% and in semantic segmentation up to x1.89 with a 5.1% loss in mIoU.
- Score: 24.119415458653616
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lightweight and effective models are essential for devices with limited resources, such as intelligent vehicles. Structured pruning offers a promising approach to model compression and efficiency enhancement. However, existing methods often tie pruning techniques to specific model architectures or vision tasks. To address this limitation, we propose a novel unified pruning framework Comb, Prune, Distill (CPD), which addresses both model-agnostic and task-agnostic concerns simultaneously. Our framework employs a combing step to resolve hierarchical layer-wise dependency issues, enabling architecture independence. Additionally, the pruning pipeline adaptively remove parameters based on the importance scoring metrics regardless of vision tasks. To support the model in retaining its learned information, we introduce knowledge distillation during the pruning step. Extensive experiments demonstrate the generalizability of our framework, encompassing both convolutional neural network (CNN) and transformer models, as well as image classification and segmentation tasks. In image classification we achieve a speedup of up to x4.3 with a accuracy loss of 1.8% and in semantic segmentation up to x1.89 with a 5.1% loss in mIoU.
Related papers
- Generalized Nested Latent Variable Models for Lossy Coding applied to Wind Turbine Scenarios [14.48369551534582]
A learning-based approach seeks to minimize the compromise between compression rate and reconstructed image quality.
A successful technique consists in introducing a deep hyperprior that operates within a 2-level nested latent variable model.
This paper extends this concept by designing a generalized L-level nested generative model with a Markov chain structure.
arXiv Detail & Related papers (2024-06-10T11:00:26Z) - A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies [51.7643024367548]
Stable Diffusion Model is a prevalent and effective model for text-to-image (T2I) and image-to-image (I2I) generation.
This study focuses on reducing redundant computation in SDM and optimizing the model through both tuning and tuning-free methods.
arXiv Detail & Related papers (2024-05-31T21:47:05Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Class Anchor Margin Loss for Content-Based Image Retrieval [97.81742911657497]
We propose a novel repeller-attractor loss that falls in the metric learning paradigm, yet directly optimize for the L2 metric without the need of generating pairs.
We evaluate the proposed objective in the context of few-shot and full-set training on the CBIR task, by using both convolutional and transformer architectures.
arXiv Detail & Related papers (2023-06-01T12:53:10Z) - DepGraph: Towards Any Structural Pruning [68.40343338847664]
We study general structural pruning of arbitrary architecture like CNNs, RNNs, GNNs and Transformers.
We propose a general and fully automatic method, emphDependency Graph (DepGraph), to explicitly model the dependency between layers and comprehensively group parameters for pruning.
In this work, we extensively evaluate our method on several architectures and tasks, including ResNe(X)t, DenseNet, MobileNet and Vision transformer for images, GAT for graph, DGCNN for 3D point cloud, alongside LSTM for language, and demonstrate that, even with a
arXiv Detail & Related papers (2023-01-30T14:02:33Z) - Distributed Pruning Towards Tiny Neural Networks in Federated Learning [12.63559789381064]
FedTiny is a distributed pruning framework for federated learning.
It generates specialized tiny models for memory- and computing-constrained devices.
It achieves an accuracy improvement of 2.61% while significantly reducing the computational cost by 95.91%.
arXiv Detail & Related papers (2022-12-05T01:58:45Z) - Layer Pruning on Demand with Intermediate CTC [50.509073206630994]
We present a training and pruning method for ASR based on the connectionist temporal classification (CTC)
We show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU.
arXiv Detail & Related papers (2021-06-17T02:40:18Z) - Compact CNN Structure Learning by Knowledge Distillation [34.36242082055978]
We propose a framework that leverages knowledge distillation along with customizable block-wise optimization to learn a lightweight CNN structure.
Our method results in a state of the art network compression while being capable of achieving better inference accuracy.
In particular, for the already compact network MobileNet_v2, our method offers up to 2x and 5.2x better model compression.
arXiv Detail & Related papers (2021-04-19T10:34:22Z) - DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator
Search [55.164053971213576]
convolutional neural network has achieved great success in fulfilling computer vision tasks despite large computation overhead.
Structured (channel) pruning is usually applied to reduce the model redundancy while preserving the network structure.
Existing structured pruning methods require hand-crafted rules which may lead to tremendous pruning space.
arXiv Detail & Related papers (2020-11-04T07:43:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.