Automatic Channel Pruning for Multi-Head Attention
- URL: http://arxiv.org/abs/2405.20867v1
- Date: Fri, 31 May 2024 14:47:20 GMT
- Title: Automatic Channel Pruning for Multi-Head Attention
- Authors: Eunho Lee, Youngbae Hwang,
- Abstract summary: We propose an automatic channel pruning method to take into account the multi-head attention mechanism.
On ImageNet-1K, applying our pruning method to the FLattenTransformer, shows outperformed accuracy for several MACs.
- Score: 0.11049608786515838
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the strong performance of Transformers, their quadratic computation complexity presents challenges in applying them to vision tasks. Automatic pruning is one of effective methods for reducing computation complexity without heuristic approaches. However, directly applying it to multi-head attention is not straightforward due to channel misalignment. In this paper, we propose an automatic channel pruning method to take into account the multi-head attention mechanism. First, we incorporate channel similarity-based weights into the pruning indicator to preserve more informative channels in each head. Then, we adjust pruning indicator to enforce removal of channels in equal proportions across all heads, preventing the channel misalignment. We also add a reweight module to compensate for information loss resulting from channel removal, and an effective initialization step for pruning indicator based on difference of attention between original structure and each channel. Our proposed method can be used to not only original attention, but also linear attention, which is more efficient as linear complexity with respect to the number of tokens. On ImageNet-1K, applying our pruning method to the FLattenTransformer, which includes both attention mechanisms, shows outperformed accuracy for several MACs compared with previous state-of-the-art efficient models and pruned methods. Code will be available soon.
Related papers
- Skip-Attention: Improving Vision Transformers by Paying Less Attention [55.47058516775423]
Vision computation transformers (ViTs) use expensive self-attention operations in every layer.
We propose SkipAt, a method to reuse self-attention from preceding layers to approximate attention at one or more subsequent layers.
We show the effectiveness of our method in image classification and self-supervised learning on ImageNet-1K, semantic segmentation on ADE20K, image denoising on SIDD, and video denoising on DAVIS.
arXiv Detail & Related papers (2023-01-05T18:59:52Z) - Revisiting Random Channel Pruning for Neural Network Compression [159.99002793644163]
Channel (or 3D filter) pruning serves as an effective way to accelerate the inference of neural networks.
In this paper, we try to determine the channel configuration of the pruned models by random search.
We show that this simple strategy works quite well compared with other channel pruning methods.
arXiv Detail & Related papers (2022-05-11T17:59:04Z) - CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization [61.71504948770445]
We propose a novel channel pruning method via Class-Aware Trace Ratio Optimization (CATRO) to reduce the computational burden and accelerate the model inference.
We show that CATRO achieves higher accuracy with similar cost or lower cost with similar accuracy than other state-of-the-art channel pruning algorithms.
Because of its class-aware property, CATRO is suitable to prune efficient networks adaptively for various classification subtasks, enhancing handy deployment and usage of deep networks in real-world applications.
arXiv Detail & Related papers (2021-10-21T06:26:31Z) - Group Fisher Pruning for Practical Network Compression [58.25776612812883]
We present a general channel pruning approach that can be applied to various complicated structures.
We derive a unified metric based on Fisher information to evaluate the importance of a single channel and coupled channels.
Our method can be used to prune any structures including those with coupled channels.
arXiv Detail & Related papers (2021-08-02T08:21:44Z) - Visual Transformer Pruning [44.43429237788078]
We present an visual transformer pruning approach, which identifies the impacts of channels in each layer and then executes pruning accordingly.
The pipeline for visual transformer pruning is as follows: 1) training with sparsity regularization; 2) pruning channels; 3) finetuning.
The reduced parameters and FLOPs ratios of the proposed algorithm are well evaluated and analyzed on ImageNet dataset to demonstrate its effectiveness.
arXiv Detail & Related papers (2021-04-17T09:49:24Z) - Channelized Axial Attention for Semantic Segmentation [70.14921019774793]
We propose the Channelized Axial Attention (CAA) to seamlessly integratechannel attention and axial attention with reduced computationalcomplexity.
Our CAA not onlyrequires much less computation resources compared with otherdual attention models such as DANet, but also outperforms the state-of-the-art ResNet-101-based segmentation models on alltested datasets.
arXiv Detail & Related papers (2021-01-19T03:08:03Z) - AutoPruning for Deep Neural Network with Dynamic Channel Masking [28.018077874687343]
We propose a learning based auto pruning algorithm for deep neural network.
A two objectives' problem that aims for the the weights and the best channels for each layer is first formulated.
An alternative optimization approach is then proposed to derive the optimal channel numbers and weights simultaneously.
arXiv Detail & Related papers (2020-10-22T20:12:46Z) - Operation-Aware Soft Channel Pruning using Differentiable Masks [51.04085547997066]
We propose a data-driven algorithm, which compresses deep neural networks in a differentiable way by exploiting the characteristics of operations.
We perform extensive experiments and achieve outstanding performance in terms of the accuracy of output networks.
arXiv Detail & Related papers (2020-07-08T07:44:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.