GOHSP: A Unified Framework of Graph and Optimization-based Heterogeneous
Structured Pruning for Vision Transformer
- URL: http://arxiv.org/abs/2301.05345v1
- Date: Fri, 13 Jan 2023 00:40:24 GMT
- Title: GOHSP: A Unified Framework of Graph and Optimization-based Heterogeneous
Structured Pruning for Vision Transformer
- Authors: Miao Yin, Burak Uzkent, Yilin Shen, Hongxia Jin, Bo Yuan
- Abstract summary: Vision transformers (ViTs) have shown very impressive empirical performance in various computer vision tasks.
To mitigate this challenging problem, structured pruning is a promising solution to compress model size and enable practical efficiency.
We propose GOHSP, a unified framework of Graph and Optimization-based Structured Pruning for ViT models.
- Score: 76.2625311630021
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The recently proposed Vision transformers (ViTs) have shown very impressive
empirical performance in various computer vision tasks, and they are viewed as
an important type of foundation model. However, ViTs are typically constructed
with large-scale sizes, which then severely hinder their potential deployment
in many practical resources-constrained applications. To mitigate this
challenging problem, structured pruning is a promising solution to compress
model size and enable practical efficiency. However, unlike its current
popularity for CNNs and RNNs, structured pruning for ViT models is little
explored.
In this paper, we propose GOHSP, a unified framework of Graph and
Optimization-based Structured Pruning for ViT models. We first develop a
graph-based ranking for measuring the importance of attention heads, and the
extracted importance information is further integrated to an optimization-based
procedure to impose the heterogeneous structured sparsity patterns on the ViT
models. Experimental results show that our proposed GOHSP demonstrates
excellent compression performance. On CIFAR-10 dataset, our approach can bring
40% parameters reduction with no accuracy loss for ViT-Small model. On ImageNet
dataset, with 30% and 35% sparsity ratio for DeiT-Tiny and DeiT-Small models,
our approach achieves 1.65% and 0.76% accuracy increase over the existing
structured pruning methods, respectively.
Related papers
- Visual Fourier Prompt Tuning [63.66866445034855]
We propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models.
Our approach incorporates the Fast Fourier Transform into prompt embeddings and harmoniously considers both spatial and frequency domain information.
Our results demonstrate that our approach outperforms current state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2024-11-02T18:18:35Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive.
We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation.
Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z) - COMCAT: Towards Efficient Compression and Customization of
Attention-Based Vision Models [21.07857091998763]
This paper explores an efficient method for compressing vision transformers to enrich the toolset for obtaining compact attention-based vision models.
For compressing DeiT-small and DeiT-base models on ImageNet, our proposed approach can achieve 0.45% and 0.76% higher top-1 accuracy even with fewer parameters.
arXiv Detail & Related papers (2023-05-26T19:50:00Z) - Multi-Dimensional Model Compression of Vision Transformer [21.8311401851523]
Vision transformers (ViT) have recently attracted considerable attentions, but the huge computational cost remains an issue for practical deployment.
Previous ViT pruning methods tend to prune the model along one dimension solely.
We advocate a multi-dimensional ViT compression paradigm, and propose to harness the redundancy reduction from attention head, neuron and sequence dimensions jointly.
arXiv Detail & Related papers (2021-12-31T19:54:18Z) - A Unified Pruning Framework for Vision Transformers [40.7622551128182]
Vision transformer (ViT) and its variants have achieved promising performances in various computer vision tasks.
We propose a unified framework for structural pruning of both ViTs and its variants, namely UP-ViTs.
Our method focuses on pruning all ViTs components while maintaining the consistency of the model structure.
arXiv Detail & Related papers (2021-11-30T05:01:02Z) - Global Vision Transformer Pruning with Hessian-Aware Saliency [93.33895899995224]
This work challenges the common design philosophy of the Vision Transformer (ViT) model with uniform dimension across all the stacked blocks in a model stage.
We derive a novel Hessian-based structural pruning criteria comparable across all layers and structures, with latency-aware regularization for direct latency reduction.
Performing iterative pruning on the DeiT-Base model leads to a new architecture family called NViT (Novel ViT), with a novel parameter that utilizes parameters more efficiently.
arXiv Detail & Related papers (2021-10-10T18:04:59Z) - When Vision Transformers Outperform ResNets without Pretraining or
Strong Data Augmentations [111.44860506703307]
Vision Transformers (ViTs) and existing VisionNets signal efforts on replacing hand-wired features or inductive throughputs with general-purpose neural architectures.
This paper investigates ViTs and Res-Mixers from the lens of loss geometry, intending to improve the models' data efficiency at training and inference.
We show that the improved robustness attributes to sparser active neurons in the first few layers.
The resultant ViTs outperform Nets of similar size and smoothness when trained from scratch on ImageNet without large-scale pretraining or strong data augmentations.
arXiv Detail & Related papers (2021-06-03T02:08:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.