Related papers: Beyond Self-Attention: Deformable Large Kernel Attention for Medical Image Segmentation

Beyond Self-Attention: Deformable Large Kernel Attention for Medical Image Segmentation

URL: http://arxiv.org/abs/2309.00121v1
Date: Thu, 31 Aug 2023 20:21:12 GMT
Title: Beyond Self-Attention: Deformable Large Kernel Attention for Medical Image Segmentation
Authors: Reza Azad, Leon Niggemeier, Michael Huttemann, Amirhossein Kazerouni, Ehsan Khodapanah Aghdam, Yury Velichko, Ulas Bagci, Dorit Merhof
Abstract summary: We introduce the concept of textbfDeformable Large Kernel Attention (D-LKA Attention), a streamlined attention mechanism employing large convolution kernels to fully appreciate volumetric context. Our proposed attention mechanism benefits from deformable convolutions to flexibly warp the sampling grid, enabling the model to adapt appropriately to diverse data patterns.
Score: 3.132430938881454
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Medical image segmentation has seen significant improvements with transformer models, which excel in grasping far-reaching contexts and global contextual information. However, the increasing computational demands of these models, proportional to the squared token count, limit their depth and resolution capabilities. Most current methods process D volumetric image data slice-by-slice (called pseudo 3D), missing crucial inter-slice information and thus reducing the model's overall performance. To address these challenges, we introduce the concept of \textbf{Deformable Large Kernel Attention (D-LKA Attention)}, a streamlined attention mechanism employing large convolution kernels to fully appreciate volumetric context. This mechanism operates within a receptive field akin to self-attention while sidestepping the computational overhead. Additionally, our proposed attention mechanism benefits from deformable convolutions to flexibly warp the sampling grid, enabling the model to adapt appropriately to diverse data patterns. We designed both 2D and 3D adaptations of the D-LKA Attention, with the latter excelling in cross-depth data understanding. Together, these components shape our novel hierarchical Vision Transformer architecture, the \textit{D-LKA Net}. Evaluations of our model against leading methods on popular medical segmentation datasets (Synapse, NIH Pancreas, and Skin lesion) demonstrate its superior performance. Our code implementation is publicly available at the: https://github.com/mindflow-institue/deformableLKA

Related papers

Optimized Unet with Attention Mechanism for Multi-Scale Semantic Segmentation [8.443350618722564]
This paper proposes an improved Unet model combined with an attention mechanism. It introduces channel attention and spatial attention modules, enhances the model's ability to focus on important features. The improved model performs well in terms of mIoU and pixel accuracy (PA), reaching 76.5% and 95.3% respectively.
arXiv Detail & Related papers (2025-02-06T06:51:23Z)
Towards Scalable and Deep Graph Neural Networks via Noise Masking [59.058558158296265]
Graph Neural Networks (GNNs) have achieved remarkable success in many graph mining tasks.<n> scaling them to large graphs is challenging due to the high computational and storage costs.<n>We present random walk with noise masking (RMask), a plug-and-play module compatible with the existing model-simplification works.
arXiv Detail & Related papers (2024-12-19T07:48:14Z)
SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation [0.13654846342364302]
We present SegFormer3D, a hierarchical Transformer that calculates attention across multiscale volumetric features. SegFormer3D avoids complex decoders and uses an all-MLP decoder to aggregate local and global attention features. We benchmark SegFormer3D against the current SOTA models on three widely used datasets.
arXiv Detail & Related papers (2024-04-15T22:12:05Z)
Masked LoGoNet: Fast and Accurate 3D Image Analysis for Medical Domain [48.440691680864745]
We introduce a new neural network architecture, termed LoGoNet, with a tailored self-supervised learning (SSL) method. LoGoNet integrates a novel feature extractor within a U-shaped architecture, leveraging Large Kernel Attention (LKA) and a dual encoding strategy. We propose a novel SSL method tailored for 3D images to compensate for the lack of large labeled datasets.
arXiv Detail & Related papers (2024-02-09T05:06:58Z)
PMFSNet: Polarized Multi-scale Feature Self-attention Network For Lightweight Medical Image Segmentation [6.134314911212846]
Current state-of-the-art medical image segmentation methods prioritize accuracy but often at the expense of increased computational demands and larger model sizes. We propose PMFSNet, a novel medical imaging segmentation model that balances global local feature processing while avoiding computational redundancy. It incorporates a plug-and-play PMFS block, a multi-scale feature enhancement module based on attention mechanisms, to capture long-term dependencies.
arXiv Detail & Related papers (2024-01-15T10:26:47Z)
Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis. We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z)
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention [87.41016963608067]
We present Deformable Attention Transformer ( DAT++), a vision backbone efficient and effective for visual recognition. DAT++ achieves state-of-the-art results on various visual recognition benchmarks, with 85.9% ImageNet accuracy, 54.5 and 47.0 MS-COCO instance segmentation mAP, and 51.5 ADE20K semantic segmentation mIoU.
arXiv Detail & Related papers (2023-09-04T08:26:47Z)
Laplacian-Former: Overcoming the Limitations of Vision Transformers in Local Texture Detection [3.784298636620067]
Vision Transformer (ViT) models have demonstrated a breakthrough in a wide range of computer vision tasks. These models struggle to capture high-frequency components of images, which can limit their ability to detect local textures and edge information. We propose a new technique, Laplacian-Former, that enhances the self-attention map by adaptively re-calibrating the frequency information in a Laplacian pyramid.
arXiv Detail & Related papers (2023-08-31T19:56:14Z)
Lesion-aware Dynamic Kernel for Polyp Segmentation [49.63274623103663]
We propose a lesion-aware dynamic network (LDNet) for polyp segmentation. It is a traditional u-shape encoder-decoder structure incorporated with a dynamic kernel generation and updating scheme. This simple but effective scheme endows our model with powerful segmentation performance and generalization capability.
arXiv Detail & Related papers (2023-01-12T09:53:57Z)
Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision. A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive. We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z)
MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation. It simultaneously learns global semantic information and local spatial-detailed features. Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z)
Dynamic Linear Transformer for 3D Biomedical Image Segmentation [2.440109381823186]
Transformer-based neural networks have surpassed promising performance on many biomedical image segmentation tasks. Main challenge for 3D transformer-based segmentation methods is the quadratic complexity introduced by the self-attention mechanism. We propose a novel transformer architecture for 3D medical image segmentation using an encoder-decoder style architecture with linear complexity.
arXiv Detail & Related papers (2022-06-01T21:15:01Z)
Vision Transformer with Deformable Attention [29.935891419574602]
Large, sometimes even global, receptive field endows Transformer models with higher representation power over their CNN counterparts. We propose a novel deformable self-attention module, where the positions of key and value pairs in self-attention are selected in a data-dependent way. We present Deformable Attention Transformer, a general backbone model with deformable attention for both image classification and dense prediction tasks.
arXiv Detail & Related papers (2022-01-03T08:29:01Z)
CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation. We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.