Global Vision Transformer Pruning with Hessian-Aware Saliency
- URL: http://arxiv.org/abs/2110.04869v2
- Date: Wed, 29 Mar 2023 21:00:43 GMT
- Title: Global Vision Transformer Pruning with Hessian-Aware Saliency
- Authors: Huanrui Yang, Hongxu Yin, Maying Shen, Pavlo Molchanov, Hai Li, Jan
Kautz
- Abstract summary: This work challenges the common design philosophy of the Vision Transformer (ViT) model with uniform dimension across all the stacked blocks in a model stage.
We derive a novel Hessian-based structural pruning criteria comparable across all layers and structures, with latency-aware regularization for direct latency reduction.
Performing iterative pruning on the DeiT-Base model leads to a new architecture family called NViT (Novel ViT), with a novel parameter that utilizes parameters more efficiently.
- Score: 93.33895899995224
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers yield state-of-the-art results across many tasks. However, their
heuristically designed architecture impose huge computational costs during
inference. This work aims on challenging the common design philosophy of the
Vision Transformer (ViT) model with uniform dimension across all the stacked
blocks in a model stage, where we redistribute the parameters both across
transformer blocks and between different structures within the block via the
first systematic attempt on global structural pruning. Dealing with diverse ViT
structural components, we derive a novel Hessian-based structural pruning
criteria comparable across all layers and structures, with latency-aware
regularization for direct latency reduction. Performing iterative pruning on
the DeiT-Base model leads to a new architecture family called NViT (Novel ViT),
with a novel parameter redistribution that utilizes parameters more
efficiently. On ImageNet-1K, NViT-Base achieves a 2.6x FLOPs reduction, 5.1x
parameter reduction, and 1.9x run-time speedup over the DeiT-Base model in a
near lossless manner. Smaller NViT variants achieve more than 1% accuracy gain
at the same throughput of the DeiT Small/Tiny variants, as well as a lossless
3.3x parameter reduction over the SWIN-Small model. These results outperform
prior art by a large margin. Further analysis is provided on the parameter
redistribution insight of NViT, where we show the high prunability of ViT
models, distinct sensitivity within ViT block, and unique parameter
distribution trend across stacked ViT blocks. Our insights provide viability
for a simple yet effective parameter redistribution rule towards more efficient
ViTs for off-the-shelf performance boost.
Related papers
- Visual Fourier Prompt Tuning [63.66866445034855]
We propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models.
Our approach incorporates the Fast Fourier Transform into prompt embeddings and harmoniously considers both spatial and frequency domain information.
Our results demonstrate that our approach outperforms current state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2024-11-02T18:18:35Z) - LPViT: Low-Power Semi-structured Pruning for Vision Transformers [42.91130720962956]
Vision transformers (ViTs) have emerged as a promising alternative to convolutional neural networks for image analysis tasks.
One significant drawback of ViTs is their resource-intensive nature, leading to increased memory footprint, complexity, and power consumption.
We introduce a new block-structured pruning to address the resource-intensive issue for ViTs, offering a balanced trade-off between accuracy and hardware acceleration.
arXiv Detail & Related papers (2024-07-02T08:58:19Z) - Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - Plug n' Play: Channel Shuffle Module for Enhancing Tiny Vision
Transformers [15.108494142240993]
Vision Transformers (ViTs) have demonstrated remarkable performance in various computer vision tasks.
High computational complexity hinders ViTs' applicability on devices with limited memory and computing resources.
We propose a novel channel shuffle module to improve tiny-size ViTs.
arXiv Detail & Related papers (2023-10-09T11:56:35Z) - GOHSP: A Unified Framework of Graph and Optimization-based Heterogeneous
Structured Pruning for Vision Transformer [76.2625311630021]
Vision transformers (ViTs) have shown very impressive empirical performance in various computer vision tasks.
To mitigate this challenging problem, structured pruning is a promising solution to compress model size and enable practical efficiency.
We propose GOHSP, a unified framework of Graph and Optimization-based Structured Pruning for ViT models.
arXiv Detail & Related papers (2023-01-13T00:40:24Z) - EIT: Efficiently Lead Inductive Biases to ViT [17.66805405320505]
Vision Transformer (ViT) depends on properties similar to the inductive bias inherent in Convolutional Neural Networks.
We propose an architecture called Efficiently lead Inductive biases to ViT (EIT), which can effectively lead the inductive biases to both phases of ViT.
In four popular small-scale datasets, compared with ViT, EIT has an accuracy improvement of 12.6% on average with fewer parameters and FLOPs.
arXiv Detail & Related papers (2022-03-14T14:01:17Z) - AdaViT: Adaptive Tokens for Efficient Vision Transformer [91.88404546243113]
We introduce AdaViT, a method that adaptively adjusts the inference cost of vision transformer (ViT) for images of different complexity.
AdaViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds.
arXiv Detail & Related papers (2021-12-14T18:56:07Z) - A Unified Pruning Framework for Vision Transformers [40.7622551128182]
Vision transformer (ViT) and its variants have achieved promising performances in various computer vision tasks.
We propose a unified framework for structural pruning of both ViTs and its variants, namely UP-ViTs.
Our method focuses on pruning all ViTs components while maintaining the consistency of the model structure.
arXiv Detail & Related papers (2021-11-30T05:01:02Z) - Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples.
We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.