Dilated-UNet: A Fast and Accurate Medical Image Segmentation Approach
using a Dilated Transformer and U-Net Architecture
- URL: http://arxiv.org/abs/2304.11450v1
- Date: Sat, 22 Apr 2023 17:20:13 GMT
- Title: Dilated-UNet: A Fast and Accurate Medical Image Segmentation Approach
using a Dilated Transformer and U-Net Architecture
- Authors: Davoud Saadati, Omid Nejati Manzari, Sattar Mirzakuchaki
- Abstract summary: This paper introduces Dilated-UNet, which combines a Dilated Transformer block with the U-Net architecture for accurate and fast medical image segmentation.
The results of our experiments show that Dilated-UNet outperforms other models on several challenging medical image segmentation datasets.
- Score: 0.6445605125467572
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Medical image segmentation is crucial for the development of computer-aided
diagnostic and therapeutic systems, but still faces numerous difficulties. In
recent years, the commonly used encoder-decoder architecture based on CNNs has
been applied effectively in medical image segmentation, but has limitations in
terms of learning global context and spatial relationships. Some researchers
have attempted to incorporate transformers into both the decoder and encoder
components, with promising results, but this approach still requires further
improvement due to its high computational complexity. This paper introduces
Dilated-UNet, which combines a Dilated Transformer block with the U-Net
architecture for accurate and fast medical image segmentation. Image patches
are transformed into tokens and fed into the U-shaped encoder-decoder
architecture, with skip-connections for local-global semantic feature learning.
The encoder uses a hierarchical Dilated Transformer with a combination of
Neighborhood Attention and Dilated Neighborhood Attention Transformer to
extract local and sparse global attention. The results of our experiments show
that Dilated-UNet outperforms other models on several challenging medical image
segmentation datasets, such as ISIC and Synapse.
Related papers
- 3D TransUNet: Advancing Medical Image Segmentation through Vision
Transformers [40.21263511313524]
Medical image segmentation plays a crucial role in advancing healthcare systems for disease diagnosis and treatment planning.
The u-shaped architecture, popularly known as U-Net, has proven highly successful for various medical image segmentation tasks.
To address these limitations, researchers have turned to Transformers, renowned for their global self-attention mechanisms.
arXiv Detail & Related papers (2023-10-11T18:07:19Z) - Focused Decoding Enables 3D Anatomical Detection by Transformers [64.36530874341666]
We propose a novel Detection Transformer for 3D anatomical structure detection, dubbed Focused Decoder.
Focused Decoder leverages information from an anatomical region atlas to simultaneously deploy query anchors and restrict the cross-attention's field of view.
We evaluate our proposed approach on two publicly available CT datasets and demonstrate that Focused Decoder not only provides strong detection results and thus alleviates the need for a vast amount of annotated data but also exhibits exceptional and highly intuitive explainability of results via attention weights.
arXiv Detail & Related papers (2022-07-21T22:17:21Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - LeViT-UNet: Make Faster Encoders with Transformer for Medical Image
Segmentation [6.2059756782278965]
We propose LeViT-UNet, which integrates a LeViT Transformer module into the U-Net architecture.
Specifically, we use LeViT as the encoder of the LeViT-UNet, which better trades off the accuracy and efficiency of the Transformer block.
arXiv Detail & Related papers (2021-07-19T05:48:51Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - UNETR: Transformers for 3D Medical Image Segmentation [8.59571749685388]
We introduce a novel architecture, dubbed as UNEt TRansformers (UNETR), that utilizes a pure transformer as the encoder to learn sequence representations of the input volume.
We have extensively validated the performance of our proposed model across different imaging modalities.
arXiv Detail & Related papers (2021-03-18T20:17:15Z) - Medical Transformer: Gated Axial-Attention for Medical Image
Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks.
We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module.
To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z) - Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers [149.78470371525754]
We treat semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer to encode an image as a sequence of patches.
With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR)
SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes.
arXiv Detail & Related papers (2020-12-31T18:55:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.