DAE-Former: Dual Attention-guided Efficient Transformer for Medical
Image Segmentation
- URL: http://arxiv.org/abs/2212.13504v3
- Date: Wed, 26 Jul 2023 18:32:36 GMT
- Title: DAE-Former: Dual Attention-guided Efficient Transformer for Medical
Image Segmentation
- Authors: Reza Azad, Ren\'e Arimond, Ehsan Khodapanah Aghdam, Amirhossein
Kazerouni, Dorit Merhof
- Abstract summary: We propose DAE-Former, a novel method that seeks to provide an alternative perspective by efficiently designing the self-attention mechanism.
Our method outperforms state-of-the-art methods on multi-organ cardiac and skin lesion segmentation datasets without requiring pre-training weights.
- Score: 3.9548535445908928
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformers have recently gained attention in the computer vision domain due
to their ability to model long-range dependencies. However, the self-attention
mechanism, which is the core part of the Transformer model, usually suffers
from quadratic computational complexity with respect to the number of tokens.
Many architectures attempt to reduce model complexity by limiting the
self-attention mechanism to local regions or by redesigning the tokenization
process. In this paper, we propose DAE-Former, a novel method that seeks to
provide an alternative perspective by efficiently designing the self-attention
mechanism. More specifically, we reformulate the self-attention mechanism to
capture both spatial and channel relations across the whole feature dimension
while staying computationally efficient. Furthermore, we redesign the skip
connection path by including the cross-attention module to ensure the feature
reusability and enhance the localization power. Our method outperforms
state-of-the-art methods on multi-organ cardiac and skin lesion segmentation
datasets without requiring pre-training weights. The code is publicly available
at https://github.com/mindflow-institue/DAEFormer.
Related papers
- Beyond Self-Attention: Deformable Large Kernel Attention for Medical
Image Segmentation [3.132430938881454]
We introduce the concept of textbfDeformable Large Kernel Attention (D-LKA Attention), a streamlined attention mechanism employing large convolution kernels to fully appreciate volumetric context.
Our proposed attention mechanism benefits from deformable convolutions to flexibly warp the sampling grid, enabling the model to adapt appropriately to diverse data patterns.
arXiv Detail & Related papers (2023-08-31T20:21:12Z) - PSLT: A Light-weight Vision Transformer with Ladder Self-Attention and
Progressive Shift [139.17852337764586]
Vision Transformer (ViT) has shown great potential for various visual tasks due to its ability to model long-range dependency.
We propose a ladder self-attention block with multiple branches and a progressive shift mechanism to develop a light-weight transformer backbone.
arXiv Detail & Related papers (2023-04-07T05:21:37Z) - How Much Does Attention Actually Attend? Questioning the Importance of
Attention in Pretrained Transformers [59.57128476584361]
We introduce PAPA, a new probing method that replaces the input-dependent attention matrices with constant ones.
We find that without any input-dependent attention, all models achieve competitive performance.
We show that better-performing models lose more from applying our method than weaker models, suggesting that the utilization of the input-dependent attention mechanism might be a factor in their success.
arXiv Detail & Related papers (2022-11-07T12:37:54Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Rethinking Query-Key Pairwise Interactions in Vision Transformers [5.141895475956681]
We propose key-only attention, which excludes query-key pairwise interactions and uses a compute-efficient saliency-gate to obtain attention weights.
We develop a new self-attention model family, LinGlos, which reach state-of-the-art accuracies on the parameter-limited setting of ImageNet classification benchmark.
arXiv Detail & Related papers (2022-07-01T03:36:49Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Learned Queries for Efficient Local Attention [11.123272845092611]
Self-attention mechanism in vision transformers suffers from high latency and inefficient memory utilization.
We propose a new shift-invariant local attention layer, called query and attend (QnA), that aggregates the input locally in an overlapping manner.
We show improvements in speed and memory complexity while achieving comparable accuracy with state-of-the-art models.
arXiv Detail & Related papers (2021-12-21T18:52:33Z) - Shunted Self-Attention via Multi-Scale Token Aggregation [124.16925784748601]
Recent Vision Transformer(ViT) models have demonstrated encouraging results across various computer vision tasks.
We propose shunted self-attention(SSA) that allows ViTs to model the attentions at hybrid scales per attention layer.
The SSA-based transformer achieves 84.0% Top-1 accuracy and outperforms the state-of-the-art Focal Transformer on ImageNet.
arXiv Detail & Related papers (2021-11-30T08:08:47Z) - Combiner: Full Attention Transformer with Sparse Computation Cost [142.10203598824964]
We propose Combiner, which provides full attention capability in each attention head while maintaining low computation complexity.
We show that most sparse attention patterns used in existing sparse transformers are able to inspire the design of such factorization for full attention.
An experimental evaluation on both autoregressive and bidirectional sequence tasks demonstrates the effectiveness of this approach.
arXiv Detail & Related papers (2021-07-12T22:43:11Z) - UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation [6.646135062704341]
Transformer architecture has emerged to be successful in a number of natural language processing tasks.
We present UTNet, a powerful hybrid Transformer architecture that integrates self-attention into a convolutional neural network for enhancing medical image segmentation.
arXiv Detail & Related papers (2021-07-02T00:56:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.