TransDeepLab: Convolution-Free Transformer-based DeepLab v3+ for Medical
Image Segmentation
- URL: http://arxiv.org/abs/2208.00713v1
- Date: Mon, 1 Aug 2022 09:53:53 GMT
- Title: TransDeepLab: Convolution-Free Transformer-based DeepLab v3+ for Medical
Image Segmentation
- Authors: Reza Azad, Moein Heidari, Moein Shariatnia, Ehsan Khodapanah Aghdam,
Sanaz Karimijafarbigloo, Ehsan Adeli, Dorit Merhof
- Abstract summary: This paper proposes TransDeepLab, a novel DeepLab-like pure Transformer for medical image segmentation.
We exploit hierarchical Swin-Transformer with shifted windows to extend the DeepLabv3 and model the Atrous Spatial Pyramid Pooling (ASPP) module.
Our approach performs superior or on par with most contemporary works on an amalgamation of Vision Transformer and CNN-based methods.
- Score: 11.190117191084175
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Convolutional neural networks (CNNs) have been the de facto standard in a
diverse set of computer vision tasks for many years. Especially, deep neural
networks based on seminal architectures such as U-shaped models with
skip-connections or atrous convolution with pyramid pooling have been tailored
to a wide range of medical image analysis tasks. The main advantage of such
architectures is that they are prone to detaining versatile local features.
However, as a general consensus, CNNs fail to capture long-range dependencies
and spatial correlations due to the intrinsic property of confined receptive
field size of convolution operations. Alternatively, Transformer, profiting
from global information modelling that stems from the self-attention mechanism,
has recently attained remarkable performance in natural language processing and
computer vision. Nevertheless, previous studies prove that both local and
global features are critical for a deep model in dense prediction, such as
segmenting complicated structures with disparate shapes and configurations. To
this end, this paper proposes TransDeepLab, a novel DeepLab-like pure
Transformer for medical image segmentation. Specifically, we exploit
hierarchical Swin-Transformer with shifted windows to extend the DeepLabv3 and
model the Atrous Spatial Pyramid Pooling (ASPP) module. A thorough search of
the relevant literature yielded that we are the first to model the seminal
DeepLab model with a pure Transformer-based model. Extensive experiments on
various medical image segmentation tasks verify that our approach performs
superior or on par with most contemporary works on an amalgamation of Vision
Transformer and CNN-based methods, along with a significant reduction of model
complexity. The codes and trained models are publicly available at
https://github.com/rezazad68/transdeeplab
Related papers
- LiteNeXt: A Novel Lightweight ConvMixer-based Model with Self-embedding Representation Parallel for Medical Image Segmentation [2.0901574458380403]
We propose a new lightweight but efficient model, namely LiteNeXt, for medical image segmentation.
LiteNeXt is trained from scratch with small amount of parameters (0.71M) and Giga Floating Point Operations Per Second (0.42).
arXiv Detail & Related papers (2024-04-04T01:59:19Z) - VM-UNet: Vision Mamba UNet for Medical Image Segmentation [3.170171905334503]
We propose a U-shape architecture model for medical image segmentation, named Vision Mamba UNet (VM-UNet)
We conduct comprehensive experiments on the ISIC17, ISIC18, and Synapse datasets, and the results indicate that VM-UNet performs competitively in medical image segmentation tasks.
arXiv Detail & Related papers (2024-02-04T13:37:21Z) - CompletionFormer: Depth Completion with Convolutions and Vision
Transformers [0.0]
This paper proposes a Joint Convolutional Attention and Transformer block (JCAT), which deeply couples the convolutional attention layer and Vision Transformer into one block, as the basic unit to construct our depth completion model in a pyramidal structure.
Our CompletionFormer outperforms state-of-the-art CNNs-based methods on the outdoor KITTI Depth Completion benchmark and indoor NYUv2 dataset, achieving significantly higher efficiency (nearly 1/3 FLOPs) compared to pure Transformer-based methods.
arXiv Detail & Related papers (2023-04-25T17:59:47Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z) - ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias [76.16156833138038]
We propose a novel Vision Transformer Advanced by Exploring intrinsic IB from convolutions, ie, ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
In each transformer layer, ViTAE has a convolution block in parallel to the multi-head self-attention module, whose features are fused and fed into the feed-forward network.
arXiv Detail & Related papers (2021-06-07T05:31:06Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.