Beyond Self-Attention: Deformable Large Kernel Attention for Medical
Image Segmentation
- URL: http://arxiv.org/abs/2309.00121v1
- Date: Thu, 31 Aug 2023 20:21:12 GMT
- Title: Beyond Self-Attention: Deformable Large Kernel Attention for Medical
Image Segmentation
- Authors: Reza Azad, Leon Niggemeier, Michael Huttemann, Amirhossein Kazerouni,
Ehsan Khodapanah Aghdam, Yury Velichko, Ulas Bagci, Dorit Merhof
- Abstract summary: We introduce the concept of textbfDeformable Large Kernel Attention (D-LKA Attention), a streamlined attention mechanism employing large convolution kernels to fully appreciate volumetric context.
Our proposed attention mechanism benefits from deformable convolutions to flexibly warp the sampling grid, enabling the model to adapt appropriately to diverse data patterns.
- Score: 3.132430938881454
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Medical image segmentation has seen significant improvements with transformer
models, which excel in grasping far-reaching contexts and global contextual
information. However, the increasing computational demands of these models,
proportional to the squared token count, limit their depth and resolution
capabilities. Most current methods process D volumetric image data
slice-by-slice (called pseudo 3D), missing crucial inter-slice information and
thus reducing the model's overall performance. To address these challenges, we
introduce the concept of \textbf{Deformable Large Kernel Attention (D-LKA
Attention)}, a streamlined attention mechanism employing large convolution
kernels to fully appreciate volumetric context. This mechanism operates within
a receptive field akin to self-attention while sidestepping the computational
overhead. Additionally, our proposed attention mechanism benefits from
deformable convolutions to flexibly warp the sampling grid, enabling the model
to adapt appropriately to diverse data patterns. We designed both 2D and 3D
adaptations of the D-LKA Attention, with the latter excelling in cross-depth
data understanding. Together, these components shape our novel hierarchical
Vision Transformer architecture, the \textit{D-LKA Net}. Evaluations of our
model against leading methods on popular medical segmentation datasets
(Synapse, NIH Pancreas, and Skin lesion) demonstrate its superior performance.
Our code implementation is publicly available at the:
https://github.com/mindflow-institue/deformableLKA
Related papers
- SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation [0.13654846342364302]
We present SegFormer3D, a hierarchical Transformer that calculates attention across multiscale volumetric features.
SegFormer3D avoids complex decoders and uses an all-MLP decoder to aggregate local and global attention features.
We benchmark SegFormer3D against the current SOTA models on three widely used datasets.
arXiv Detail & Related papers (2024-04-15T22:12:05Z) - Masked LoGoNet: Fast and Accurate 3D Image Analysis for Medical Domain [48.440691680864745]
We introduce a new neural network architecture, termed LoGoNet, with a tailored self-supervised learning (SSL) method.
LoGoNet integrates a novel feature extractor within a U-shaped architecture, leveraging Large Kernel Attention (LKA) and a dual encoding strategy.
We propose a novel SSL method tailored for 3D images to compensate for the lack of large labeled datasets.
arXiv Detail & Related papers (2024-02-09T05:06:58Z) - PMFSNet: Polarized Multi-scale Feature Self-attention Network For
Lightweight Medical Image Segmentation [6.134314911212846]
Current state-of-the-art medical image segmentation methods prioritize accuracy but often at the expense of increased computational demands and larger model sizes.
We propose PMFSNet, a novel medical imaging segmentation model that balances global local feature processing while avoiding computational redundancy.
It incorporates a plug-and-play PMFS block, a multi-scale feature enhancement module based on attention mechanisms, to capture long-term dependencies.
arXiv Detail & Related papers (2024-01-15T10:26:47Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - DAT++: Spatially Dynamic Vision Transformer with Deformable Attention [87.41016963608067]
We present Deformable Attention Transformer ( DAT++), a vision backbone efficient and effective for visual recognition.
DAT++ achieves state-of-the-art results on various visual recognition benchmarks, with 85.9% ImageNet accuracy, 54.5 and 47.0 MS-COCO instance segmentation mAP, and 51.5 ADE20K semantic segmentation mIoU.
arXiv Detail & Related papers (2023-09-04T08:26:47Z) - Laplacian-Former: Overcoming the Limitations of Vision Transformers in
Local Texture Detection [3.784298636620067]
Vision Transformer (ViT) models have demonstrated a breakthrough in a wide range of computer vision tasks.
These models struggle to capture high-frequency components of images, which can limit their ability to detect local textures and edge information.
We propose a new technique, Laplacian-Former, that enhances the self-attention map by adaptively re-calibrating the frequency information in a Laplacian pyramid.
arXiv Detail & Related papers (2023-08-31T19:56:14Z) - Lesion-aware Dynamic Kernel for Polyp Segmentation [49.63274623103663]
We propose a lesion-aware dynamic network (LDNet) for polyp segmentation.
It is a traditional u-shape encoder-decoder structure incorporated with a dynamic kernel generation and updating scheme.
This simple but effective scheme endows our model with powerful segmentation performance and generalization capability.
arXiv Detail & Related papers (2023-01-12T09:53:57Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Dynamic Linear Transformer for 3D Biomedical Image Segmentation [2.440109381823186]
Transformer-based neural networks have surpassed promising performance on many biomedical image segmentation tasks.
Main challenge for 3D transformer-based segmentation methods is the quadratic complexity introduced by the self-attention mechanism.
We propose a novel transformer architecture for 3D medical image segmentation using an encoder-decoder style architecture with linear complexity.
arXiv Detail & Related papers (2022-06-01T21:15:01Z) - Vision Transformer with Deformable Attention [29.935891419574602]
Large, sometimes even global, receptive field endows Transformer models with higher representation power over their CNN counterparts.
We propose a novel deformable self-attention module, where the positions of key and value pairs in self-attention are selected in a data-dependent way.
We present Deformable Attention Transformer, a general backbone model with deformable attention for both image classification and dense prediction tasks.
arXiv Detail & Related papers (2022-01-03T08:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.