Related papers: DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation

DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation

URL: http://arxiv.org/abs/2212.13504v3
Date: Wed, 26 Jul 2023 18:32:36 GMT
Title: DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation
Authors: Reza Azad, Ren\'e Arimond, Ehsan Khodapanah Aghdam, Amirhossein Kazerouni, Dorit Merhof
Abstract summary: We propose DAE-Former, a novel method that seeks to provide an alternative perspective by efficiently designing the self-attention mechanism. Our method outperforms state-of-the-art methods on multi-organ cardiac and skin lesion segmentation datasets without requiring pre-training weights.
Score: 3.9548535445908928
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers have recently gained attention in the computer vision domain due to their ability to model long-range dependencies. However, the self-attention mechanism, which is the core part of the Transformer model, usually suffers from quadratic computational complexity with respect to the number of tokens. Many architectures attempt to reduce model complexity by limiting the self-attention mechanism to local regions or by redesigning the tokenization process. In this paper, we propose DAE-Former, a novel method that seeks to provide an alternative perspective by efficiently designing the self-attention mechanism. More specifically, we reformulate the self-attention mechanism to capture both spatial and channel relations across the whole feature dimension while staying computationally efficient. Furthermore, we redesign the skip connection path by including the cross-attention module to ensure the feature reusability and enhance the localization power. Our method outperforms state-of-the-art methods on multi-organ cardiac and skin lesion segmentation datasets without requiring pre-training weights. The code is publicly available at https://github.com/mindflow-institue/DAEFormer.

Related papers

Can We Achieve Efficient Diffusion without Self-Attention? Distilling Self-Attention into Convolutions [94.21989689001848]
We propose (Delta)ConvFusion to replace conventional self-attention modules with Pyramid Convolution Blocks ((Delta)ConvBlocks) By distilling attention patterns into localized convolutional operations while keeping other components frozen, (Delta)ConvFusion achieves performance comparable to transformer-based counterparts while reducing computational cost by 6929$times$ and surpassing LinFusion by 5.42$times$ in efficiency--all without compromising generative fidelity.
arXiv Detail & Related papers (2025-04-30T03:57:28Z)
Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation [158.37640586809187]
Restoring any degraded image efficiently via just one model has become increasingly significant. Our approach, termed AnyIR, takes a unified path that leverages inherent similarity across various degradations. To fuse the degradation awareness and the contextualized attention, a spatial-frequency parallel fusion strategy is proposed.
arXiv Detail & Related papers (2025-04-19T09:54:46Z)
RecurFormer: Not All Transformer Heads Need Self-Attention [14.331807060659902]
Transformer-based large language models (LLMs) excel in modeling complex language patterns but face significant computational costs during inference. We propose RecurFormer, a novel architecture that replaces certain attention heads with linear recurrent neural networks.
arXiv Detail & Related papers (2024-10-10T15:24:12Z)
TransDAE: Dual Attention Mechanism in a Hierarchical Transformer for Efficient Medical Image Segmentation [7.013315283888431]
Medical image segmentation is crucial for accurate disease diagnosis and the development of effective treatment strategies. We introduce TransDAE: a novel approach that reimagines the self-attention mechanism to include both spatial and channel-wise associations. Remarkably, TransDAE outperforms existing state-of-the-art methods on the Synaps multi-organ dataset.
arXiv Detail & Related papers (2024-09-03T16:08:48Z)
Beyond Self-Attention: Deformable Large Kernel Attention for Medical Image Segmentation [3.132430938881454]
We introduce the concept of textbfDeformable Large Kernel Attention (D-LKA Attention), a streamlined attention mechanism employing large convolution kernels to fully appreciate volumetric context. Our proposed attention mechanism benefits from deformable convolutions to flexibly warp the sampling grid, enabling the model to adapt appropriately to diverse data patterns.
arXiv Detail & Related papers (2023-08-31T20:21:12Z)
PSLT: A Light-weight Vision Transformer with Ladder Self-Attention and Progressive Shift [139.17852337764586]
Vision Transformer (ViT) has shown great potential for various visual tasks due to its ability to model long-range dependency. We propose a ladder self-attention block with multiple branches and a progressive shift mechanism to develop a light-weight transformer backbone.
arXiv Detail & Related papers (2023-04-07T05:21:37Z)
How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers [59.57128476584361]
We introduce PAPA, a new probing method that replaces the input-dependent attention matrices with constant ones. We find that without any input-dependent attention, all models achieve competitive performance. We show that better-performing models lose more from applying our method than weaker models, suggesting that the utilization of the input-dependent attention mechanism might be a factor in their success.
arXiv Detail & Related papers (2022-11-07T12:37:54Z)
Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision. A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive. We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z)
MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation. It simultaneously learns global semantic information and local spatial-detailed features. Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z)
Shunted Self-Attention via Multi-Scale Token Aggregation [124.16925784748601]
Recent Vision Transformer(ViT) models have demonstrated encouraging results across various computer vision tasks. We propose shunted self-attention(SSA) that allows ViTs to model the attentions at hybrid scales per attention layer. The SSA-based transformer achieves 84.0% Top-1 accuracy and outperforms the state-of-the-art Focal Transformer on ImageNet.
arXiv Detail & Related papers (2021-11-30T08:08:47Z)
Combiner: Full Attention Transformer with Sparse Computation Cost [142.10203598824964]
We propose Combiner, which provides full attention capability in each attention head while maintaining low computation complexity. We show that most sparse attention patterns used in existing sparse transformers are able to inspire the design of such factorization for full attention. An experimental evaluation on both autoregressive and bidirectional sequence tasks demonstrates the effectiveness of this approach.
arXiv Detail & Related papers (2021-07-12T22:43:11Z)
UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation [6.646135062704341]
Transformer architecture has emerged to be successful in a number of natural language processing tasks. We present UTNet, a powerful hybrid Transformer architecture that integrates self-attention into a convolutional neural network for enhancing medical image segmentation.
arXiv Detail & Related papers (2021-07-02T00:56:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.