The Lighter The Better: Rethinking Transformers in Medical Image
Segmentation Through Adaptive Pruning
- URL: http://arxiv.org/abs/2206.14413v1
- Date: Wed, 29 Jun 2022 05:49:36 GMT
- Title: The Lighter The Better: Rethinking Transformers in Medical Image
Segmentation Through Adaptive Pruning
- Authors: Xian Lin, Li Yu, Kwang-Ting Cheng, and Zengqiang Yan
- Abstract summary: We propose to employ adaptive pruning to transformers for medical image segmentation and propose a lightweight network APFormer.
To our best knowledge, this is the first work on transformer pruning for medical image analysis tasks.
We prove, through ablation studies, that adaptive pruning can work as a plug-n-play module for performance improvement on other hybrid-/transformer-based methods.
- Score: 26.405243756778606
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision transformers have recently set off a new wave in the field of medical
image analysis due to their remarkable performance on various computer vision
tasks. However, recent hybrid-/transformer-based approaches mainly focus on the
benefits of transformers in capturing long-range dependency while ignoring the
issues of their daunting computational complexity, high training costs, and
redundant dependency. In this paper, we propose to employ adaptive pruning to
transformers for medical image segmentation and propose a lightweight and
effective hybrid network APFormer. To our best knowledge, this is the first
work on transformer pruning for medical image analysis tasks. The key features
of APFormer mainly are self-supervised self-attention (SSA) to improve the
convergence of dependency establishment, Gaussian-prior relative position
embedding (GRPE) to foster the learning of position information, and adaptive
pruning to eliminate redundant computations and perception information.
Specifically, SSA and GRPE consider the well-converged dependency distribution
and the Gaussian heatmap distribution separately as the prior knowledge of
self-attention and position embedding to ease the training of transformers and
lay a solid foundation for the following pruning operation. Then, adaptive
transformer pruning, both query-wise and dependency-wise, is performed by
adjusting the gate control parameters for both complexity reduction and
performance improvement. Extensive experiments on two widely-used datasets
demonstrate the prominent segmentation performance of APFormer against the
state-of-the-art methods with much fewer parameters and lower GFLOPs. More
importantly, we prove, through ablation studies, that adaptive pruning can work
as a plug-n-play module for performance improvement on other
hybrid-/transformer-based methods. Code is available at
https://github.com/xianlin7/APFormer.
Related papers
- Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers [14.756988176469365]
An effective approach to reduce computational requirements and increase efficiency is to prune unnecessary components of Deep Neural Networks.
Previous work has shown that attribution methods from the field of eXplainable AI serve as effective means to extract and prune the least relevant network components in a few-shot fashion.
arXiv Detail & Related papers (2024-08-22T17:35:18Z) - Adaptive Step-size Perception Unfolding Network with Non-local Hybrid Attention for Hyperspectral Image Reconstruction [0.39134031118910273]
We propose an adaptive step-size perception unfolding network (ASPUN), a deep unfolding network based on FISTA algorithm.
In addition, we design a Non-local Hybrid Attention Transformer(NHAT) module for fully leveraging the receptive field advantage of transformer.
Experimental results show that our ASPUN is superior to the existing SOTA algorithms and achieves the best performance.
arXiv Detail & Related papers (2024-07-04T16:09:52Z) - Unfolding Once is Enough: A Deployment-Friendly Transformer Unit for
Super-Resolution [16.54421804141835]
High resolution of intermediate features in SISR models increases memory and computational requirements.
We propose a Deployment-friendly Inner-patch Transformer Network (DITN) for the SISR task.
Our models can achieve competitive results in terms of qualitative and quantitative performance with high deployment efficiency.
arXiv Detail & Related papers (2023-08-05T05:42:51Z) - AdaptiveClick: Clicks-aware Transformer with Adaptive Focal Loss for Interactive Image Segmentation [51.82915587228898]
We introduce AdaptiveClick -- a transformer-based, mask-adaptive segmentation framework for Interactive Image (IIS)
The key ingredient of our method is the Click-Aware Mask-adaptive transformer Decoder (CAMD), which enhances the interaction between click and image features.
With a plain ViT backbone, extensive experimental results on nine datasets demonstrate the superiority of AdaptiveClick compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-07T13:47:35Z) - Optimizing Vision Transformers for Medical Image Segmentation and
Few-Shot Domain Adaptation [11.690799827071606]
We propose Convolutional Swin-Unet (CS-Unet) transformer blocks and optimise their settings with relation to patch embedding, projection, the feed-forward network, up sampling and skip connections.
CS-Unet can be trained from scratch and inherits the superiority of convolutions in each feature process phase.
Experiments show that CS-Unet without pre-training surpasses other state-of-the-art counterparts by large margins on two medical CT and MRI datasets with fewer parameters.
arXiv Detail & Related papers (2022-10-14T19:18:52Z) - Plug-In Inversion: Model-Agnostic Inversion for Vision with Data
Augmentations [61.95114821573875]
We introduce Plug-In Inversion, which relies on a simple set of augmentations and does not require excessive hyper- parameter tuning.
We illustrate the practicality of our approach by inverting Vision Transformers (ViTs) and Multi-Layer Perceptrons (MLPs) trained on the ImageNet dataset.
arXiv Detail & Related papers (2022-01-31T02:12:45Z) - AdaViT: Adaptive Vision Transformers for Efficient Image Recognition [78.07924262215181]
We introduce AdaViT, an adaptive framework that learns to derive usage policies on which patches, self-attention heads and transformer blocks to use.
Our method obtains more than 2x improvement on efficiency compared to state-of-the-art vision transformers with only 0.8% drop of accuracy.
arXiv Detail & Related papers (2021-11-30T18:57:02Z) - HRFormer: High-Resolution Transformer for Dense Prediction [99.6060997466614]
We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks.
We take advantage of the multi-resolution parallel design introduced in high-resolution convolutional networks (HRNet)
We demonstrate the effectiveness of the High-Resolution Transformer on both human pose estimation and semantic segmentation tasks.
arXiv Detail & Related papers (2021-10-18T15:37:58Z) - nnFormer: Interleaved Transformer for Volumetric Segmentation [50.10441845967601]
We introduce nnFormer, a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution.
nnFormer achieves tremendous improvements over previous transformer-based methods on two commonly used datasets Synapse and ACDC.
arXiv Detail & Related papers (2021-09-07T17:08:24Z) - Medical Transformer: Gated Axial-Attention for Medical Image
Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks.
We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module.
To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.