DAE-Former: Dual Attention-guided Efficient Transformer for Medical
Image Segmentation
- URL: http://arxiv.org/abs/2212.13504v3
- Date: Wed, 26 Jul 2023 18:32:36 GMT
- Title: DAE-Former: Dual Attention-guided Efficient Transformer for Medical
Image Segmentation
- Authors: Reza Azad, Ren\'e Arimond, Ehsan Khodapanah Aghdam, Amirhossein
Kazerouni, Dorit Merhof
- Abstract summary: We propose DAE-Former, a novel method that seeks to provide an alternative perspective by efficiently designing the self-attention mechanism.
Our method outperforms state-of-the-art methods on multi-organ cardiac and skin lesion segmentation datasets without requiring pre-training weights.
- Score: 3.9548535445908928
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformers have recently gained attention in the computer vision domain due
to their ability to model long-range dependencies. However, the self-attention
mechanism, which is the core part of the Transformer model, usually suffers
from quadratic computational complexity with respect to the number of tokens.
Many architectures attempt to reduce model complexity by limiting the
self-attention mechanism to local regions or by redesigning the tokenization
process. In this paper, we propose DAE-Former, a novel method that seeks to
provide an alternative perspective by efficiently designing the self-attention
mechanism. More specifically, we reformulate the self-attention mechanism to
capture both spatial and channel relations across the whole feature dimension
while staying computationally efficient. Furthermore, we redesign the skip
connection path by including the cross-attention module to ensure the feature
reusability and enhance the localization power. Our method outperforms
state-of-the-art methods on multi-organ cardiac and skin lesion segmentation
datasets without requiring pre-training weights. The code is publicly available
at https://github.com/mindflow-institue/DAEFormer.
Related papers
- RecurFormer: Not All Transformer Heads Need Self-Attention [14.331807060659902]
Transformer-based large language models (LLMs) excel in modeling complex language patterns but face significant computational costs during inference.
We propose RecurFormer, a novel architecture that replaces certain attention heads with linear recurrent neural networks.
arXiv Detail & Related papers (2024-10-10T15:24:12Z) - TransDAE: Dual Attention Mechanism in a Hierarchical Transformer for Efficient Medical Image Segmentation [7.013315283888431]
Medical image segmentation is crucial for accurate disease diagnosis and the development of effective treatment strategies.
We introduce TransDAE: a novel approach that reimagines the self-attention mechanism to include both spatial and channel-wise associations.
Remarkably, TransDAE outperforms existing state-of-the-art methods on the Synaps multi-organ dataset.
arXiv Detail & Related papers (2024-09-03T16:08:48Z) - Beyond Self-Attention: Deformable Large Kernel Attention for Medical
Image Segmentation [3.132430938881454]
We introduce the concept of textbfDeformable Large Kernel Attention (D-LKA Attention), a streamlined attention mechanism employing large convolution kernels to fully appreciate volumetric context.
Our proposed attention mechanism benefits from deformable convolutions to flexibly warp the sampling grid, enabling the model to adapt appropriately to diverse data patterns.
arXiv Detail & Related papers (2023-08-31T20:21:12Z) - PSLT: A Light-weight Vision Transformer with Ladder Self-Attention and
Progressive Shift [139.17852337764586]
Vision Transformer (ViT) has shown great potential for various visual tasks due to its ability to model long-range dependency.
We propose a ladder self-attention block with multiple branches and a progressive shift mechanism to develop a light-weight transformer backbone.
arXiv Detail & Related papers (2023-04-07T05:21:37Z) - How Much Does Attention Actually Attend? Questioning the Importance of
Attention in Pretrained Transformers [59.57128476584361]
We introduce PAPA, a new probing method that replaces the input-dependent attention matrices with constant ones.
We find that without any input-dependent attention, all models achieve competitive performance.
We show that better-performing models lose more from applying our method than weaker models, suggesting that the utilization of the input-dependent attention mechanism might be a factor in their success.
arXiv Detail & Related papers (2022-11-07T12:37:54Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Shunted Self-Attention via Multi-Scale Token Aggregation [124.16925784748601]
Recent Vision Transformer(ViT) models have demonstrated encouraging results across various computer vision tasks.
We propose shunted self-attention(SSA) that allows ViTs to model the attentions at hybrid scales per attention layer.
The SSA-based transformer achieves 84.0% Top-1 accuracy and outperforms the state-of-the-art Focal Transformer on ImageNet.
arXiv Detail & Related papers (2021-11-30T08:08:47Z) - Combiner: Full Attention Transformer with Sparse Computation Cost [142.10203598824964]
We propose Combiner, which provides full attention capability in each attention head while maintaining low computation complexity.
We show that most sparse attention patterns used in existing sparse transformers are able to inspire the design of such factorization for full attention.
An experimental evaluation on both autoregressive and bidirectional sequence tasks demonstrates the effectiveness of this approach.
arXiv Detail & Related papers (2021-07-12T22:43:11Z) - UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation [6.646135062704341]
Transformer architecture has emerged to be successful in a number of natural language processing tasks.
We present UTNet, a powerful hybrid Transformer architecture that integrates self-attention into a convolutional neural network for enhancing medical image segmentation.
arXiv Detail & Related papers (2021-07-02T00:56:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.