Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors
in MRI Images
- URL: http://arxiv.org/abs/2201.01266v1
- Date: Tue, 4 Jan 2022 18:01:34 GMT
- Title: Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors
in MRI Images
- Authors: Ali Hatamizadeh, Vishwesh Nath, Yucheng Tang, Dong Yang, Holger Roth
and Daguang Xu
- Abstract summary: We propose a novel segmentation model termed Swin UNEt TRansformers (Swin UNETR)
The model extracts features at five different resolutions by utilizing shifted windows for computing self-attention.
We have participated in BraTS 2021 segmentation challenge, and our proposed model ranks among the top-performing approaches in the validation phase.
- Score: 7.334185314342017
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Semantic segmentation of brain tumors is a fundamental medical image analysis
task involving multiple MRI imaging modalities that can assist clinicians in
diagnosing the patient and successively studying the progression of the
malignant entity. In recent years, Fully Convolutional Neural Networks (FCNNs)
approaches have become the de facto standard for 3D medical image segmentation.
The popular "U-shaped" network architecture has achieved state-of-the-art
performance benchmarks on different 2D and 3D semantic segmentation tasks and
across various imaging modalities. However, due to the limited kernel size of
convolution layers in FCNNs, their performance of modeling long-range
information is sub-optimal, and this can lead to deficiencies in the
segmentation of tumors with variable sizes. On the other hand, transformer
models have demonstrated excellent capabilities in capturing such long-range
information in multiple domains, including natural language processing and
computer vision. Inspired by the success of vision transformers and their
variants, we propose a novel segmentation model termed Swin UNEt TRansformers
(Swin UNETR). Specifically, the task of 3D brain tumor semantic segmentation is
reformulated as a sequence to sequence prediction problem wherein multi-modal
input data is projected into a 1D sequence of embedding and used as an input to
a hierarchical Swin transformer as the encoder. The swin transformer encoder
extracts features at five different resolutions by utilizing shifted windows
for computing self-attention and is connected to an FCNN-based decoder at each
resolution via skip connections. We have participated in BraTS 2021
segmentation challenge, and our proposed model ranks among the top-performing
approaches in the validation phase. Code: https://monai.io/research/swin-unetr
Related papers
- SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion
Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification.
The proposed framework has been validated through comprehensive experiments on two clinical datasets.
To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z) - Learning from partially labeled data for multi-organ and tumor
segmentation [102.55303521877933]
We propose a Transformer based dynamic on-demand network (TransDoDNet) that learns to segment organs and tumors on multiple datasets.
A dynamic head enables the network to accomplish multiple segmentation tasks flexibly.
We create a large-scale partially labeled Multi-Organ and Tumor benchmark, termed MOTS, and demonstrate the superior performance of our TransDoDNet over other competitors.
arXiv Detail & Related papers (2022-11-13T13:03:09Z) - A Transformer-based Generative Adversarial Network for Brain Tumor
Segmentation [4.394247741333439]
We propose a transformer-based generative adversarial network to automatically segment brain tumors with multi-modalities MRI.
Our architecture consists of a generator and a discriminator, which are trained in min-max game progress.
The discriminator we designed is a CNN-based network with multi-scale $L_1$ loss, which is proved to be effective for medical semantic image segmentation.
arXiv Detail & Related papers (2022-07-28T14:55:18Z) - Dynamic Linear Transformer for 3D Biomedical Image Segmentation [2.440109381823186]
Transformer-based neural networks have surpassed promising performance on many biomedical image segmentation tasks.
Main challenge for 3D transformer-based segmentation methods is the quadratic complexity introduced by the self-attention mechanism.
We propose a novel transformer architecture for 3D medical image segmentation using an encoder-decoder style architecture with linear complexity.
arXiv Detail & Related papers (2022-06-01T21:15:01Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - UNetFormer: A Unified Vision Transformer Model and Pre-Training
Framework for 3D Medical Image Segmentation [14.873473285148853]
We introduce a unified framework consisting of two architectures, dubbed UNetFormer, with a 3D Swin Transformer-based encoder and Conal Neural Network (CNN) and transformer-based decoders.
In the proposed model, the encoder is linked to the decoder via skip connections at five different resolutions with deep supervision.
We present a methodology for self-supervised pre-training of the encoder backbone via learning to predict randomly masked tokens.
arXiv Detail & Related papers (2022-04-01T17:38:39Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - UNETR: Transformers for 3D Medical Image Segmentation [8.59571749685388]
We introduce a novel architecture, dubbed as UNEt TRansformers (UNETR), that utilizes a pure transformer as the encoder to learn sequence representations of the input volume.
We have extensively validated the performance of our proposed model across different imaging modalities.
arXiv Detail & Related papers (2021-03-18T20:17:15Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.