Related papers: Visual Transformer Pruning

Visual Transformer Pruning

URL: http://arxiv.org/abs/2104.08500v2
Date: Tue, 20 Apr 2021 04:50:49 GMT
Title: Visual Transformer Pruning
Authors: Mingjian Zhu, Kai Han, Yehui Tang, Yunhe Wang
Abstract summary: We present an visual transformer pruning approach, which identifies the impacts of channels in each layer and then executes pruning accordingly. The pipeline for visual transformer pruning is as follows: 1) training with sparsity regularization; 2) pruning channels; 3) finetuning. The reduced parameters and FLOPs ratios of the proposed algorithm are well evaluated and analyzed on ImageNet dataset to demonstrate its effectiveness.
Score: 44.43429237788078
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visual transformer has achieved competitive performance on a variety of computer vision applications. However, their storage, run-time memory, and computational demands are hindering the deployment on mobile devices. Here we present an visual transformer pruning approach, which identifies the impacts of channels in each layer and then executes pruning accordingly. By encouraging channel-wise sparsity in the Transformer, important channels automatically emerge. A great number of channels with small coefficients can be discarded to achieve a high pruning ratio without significantly compromising accuracy. The pipeline for visual transformer pruning is as follows: 1) training with sparsity regularization; 2) pruning channels; 3) finetuning. The reduced parameters and FLOPs ratios of the proposed algorithm are well evaluated and analyzed on ImageNet dataset to demonstrate its effectiveness.

Related papers

Adaptive Pruning of Pretrained Transformer via Differential Inclusions [48.47890215458465]
Current compression algorithms prune transformers at fixed compression ratios, requiring a unique pruning process for each ratio. We propose pruning of pretrained transformers at any desired ratio within a single pruning stage, based on a differential inclusion for a mask parameter. This dynamic can generate the whole regularization solution path of the mask parameter, whose support set identifies the network structure.
arXiv Detail & Related papers (2025-01-06T06:34:52Z)
Automatic Channel Pruning for Multi-Head Attention [0.11049608786515838]
We propose an automatic channel pruning method to take into account the multi-head attention mechanism. On ImageNet-1K, applying our pruning method to the FLattenTransformer, shows outperformed accuracy for several MACs.
arXiv Detail & Related papers (2024-05-31T14:47:20Z)
Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference [14.030836300221756]
textbfSparse-Tuning is a novel PEFT method that accounts for the information redundancy in images and videos. Sparse-Tuning minimizes the quantity of tokens processed at each layer, leading to a quadratic reduction in computational and memory overhead. Our results show that our Sparse-Tuning reduces GFLOPs to textbf62%-70% of the original ViT-B while achieving state-of-the-art performance.
arXiv Detail & Related papers (2024-05-23T15:34:53Z)
SPION: Layer-Wise Sparse Training of Transformer via Convolutional Flood Filling [1.0128808054306186]
We propose a novel sparsification scheme for the Transformer that integrates convolution filters and the flood filling method. Our sparsification approach reduces the computational complexity and memory footprint of the Transformer during training. New SPION achieves up to 3.08X speedup over existing state-of-the-art sparse Transformer models.
arXiv Detail & Related papers (2023-09-22T02:14:46Z)
Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks [126.33843752332139]
We introduce Group-wise Transformation towards a universal yet lightweight Transformer for vision-and-language tasks, termed as LW-Transformer. We apply LW-Transformer to a set of Transformer-based networks, and quantitatively measure them on three vision-and-language tasks and six benchmark datasets. Experimental results show that while saving a large number of parameters and computations, LW-Transformer achieves very competitive performance against the original Transformer networks for vision-and-language tasks.
arXiv Detail & Related papers (2022-04-16T11:30:26Z)
Three things everyone should know about Vision Transformers [67.30250766591405]
transformer architectures have rapidly gained traction in computer vision. We offer three insights based on simple and easy to implement variants of vision transformers. We evaluate the impact of these design choices using the ImageNet-1k dataset, and confirm our findings on the ImageNet-v2 test set.
arXiv Detail & Related papers (2022-03-18T08:23:03Z)
AdaViT: Adaptive Tokens for Efficient Vision Transformer [91.88404546243113]
We introduce AdaViT, a method that adaptively adjusts the inference cost of vision transformer (ViT) for images of different complexity. AdaViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds.
arXiv Detail & Related papers (2021-12-14T18:56:07Z)
Adaptive Channel Encoding Transformer for Point Cloud Analysis [6.90125287791398]
A channel convolution called Transformer-Conv is designed to encode the channel. It can encode feature channels by capturing the potential relationship between coordinates and features. Our method is superior to state-of-the-art point cloud classification and segmentation methods on three benchmark datasets.
arXiv Detail & Related papers (2021-12-05T08:18:00Z)
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer [63.99222215387881]
We propose Evo-ViT, a self-motivated slow-fast token evolution method for vision transformers. Our method can significantly reduce the computational costs of vision transformers while maintaining comparable performance on image classification.
arXiv Detail & Related papers (2021-08-03T09:56:07Z)
Augmented Shortcuts for Vision Transformers [49.70151144700589]
We study the relationship between shortcuts and feature diversity in vision transformer models. We present an augmented shortcut scheme, which inserts additional paths with learnable parameters in parallel on the original shortcuts. Experiments conducted on benchmark datasets demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2021-06-30T09:48:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.