DS-TransUNet:Dual Swin Transformer U-Net for Medical Image Segmentation
- URL: http://arxiv.org/abs/2106.06716v1
- Date: Sat, 12 Jun 2021 08:37:17 GMT
- Title: DS-TransUNet:Dual Swin Transformer U-Net for Medical Image Segmentation
- Authors: Ailiang Lin, Bingzhi Chen, Jiayu Xu, Zheng Zhang, Guangming Lu
- Abstract summary: We propose a novel deep medical image segmentation framework called Dual Swin Transformer U-Net (DS-TransUNet)
Unlike many prior Transformer-based solutions, the proposed DS-TransUNet first adopts dual-scale encoderworks based on Swin Transformer to extract the coarse and fine-grained feature representations of different semantic scales.
As the core component for our DS-TransUNet, a well-designed Transformer Interactive Fusion (TIF) module is proposed to effectively establish global dependencies between features of different scales through the self-attention mechanism.
- Score: 18.755217252996754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic medical image segmentation has made great progress benefit from the
development of deep learning. However, most existing methods are based on
convolutional neural networks (CNNs), which fail to build long-range
dependencies and global context connections due to the limitation of receptive
field in convolution operation. Inspired by the success of Transformer in
modeling the long-range contextual information, some researchers have expended
considerable efforts in designing the robust variants of Transformer-based
U-Net. Moreover, the patch division used in vision transformers usually ignores
the pixel-level intrinsic structural features inside each patch. To alleviate
these problems, we propose a novel deep medical image segmentation framework
called Dual Swin Transformer U-Net (DS-TransUNet), which might be the first
attempt to concurrently incorporate the advantages of hierarchical Swin
Transformer into both encoder and decoder of the standard U-shaped architecture
to enhance the semantic segmentation quality of varying medical images. Unlike
many prior Transformer-based solutions, the proposed DS-TransUNet first adopts
dual-scale encoder subnetworks based on Swin Transformer to extract the coarse
and fine-grained feature representations of different semantic scales. As the
core component for our DS-TransUNet, a well-designed Transformer Interactive
Fusion (TIF) module is proposed to effectively establish global dependencies
between features of different scales through the self-attention mechanism.
Furthermore, we also introduce the Swin Transformer block into decoder to
further explore the long-range contextual information during the up-sampling
process. Extensive experiments across four typical tasks for medical image
segmentation demonstrate the effectiveness of DS-TransUNet, and show that our
approach significantly outperforms the state-of-the-art methods.
Related papers
- Rethinking Attention Gated with Hybrid Dual Pyramid Transformer-CNN for Generalized Segmentation in Medical Imaging [17.07490339960335]
We introduce a novel hybrid CNN-Transformer segmentation architecture (PAG-TransYnet) designed for efficiently building a strong CNN-Transformer encoder.
Our approach exploits attention gates within a Dual Pyramid hybrid encoder.
arXiv Detail & Related papers (2024-04-28T14:37:10Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - 3D TransUNet: Advancing Medical Image Segmentation through Vision
Transformers [40.21263511313524]
Medical image segmentation plays a crucial role in advancing healthcare systems for disease diagnosis and treatment planning.
The u-shaped architecture, popularly known as U-Net, has proven highly successful for various medical image segmentation tasks.
To address these limitations, researchers have turned to Transformers, renowned for their global self-attention mechanisms.
arXiv Detail & Related papers (2023-10-11T18:07:19Z) - TransNorm: Transformer Provides a Strong Spatial Normalization Mechanism
for a Deep Segmentation Model [4.320393382724066]
convolutional neural networks (CNNs) have been the prevailing technique in the medical image processing era.
We propose Trans-Norm, a novel deep segmentation framework which consolidates a Transformer module into both encoder and skip-connections of the standard U-Net.
arXiv Detail & Related papers (2022-07-27T09:54:10Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - MISSFormer: An Effective Medical Image Segmentation Transformer [3.441872541209065]
CNN-based methods have achieved impressive results in medical image segmentation.
Transformer-based methods are popular in vision tasks recently because of its capacity of long-range dependencies.
We present MISSFormer, an effective and powerful Medical Image tranSFormer.
arXiv Detail & Related papers (2021-09-15T08:56:00Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z) - Medical Transformer: Gated Axial-Attention for Medical Image
Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks.
We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module.
To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.