UNetFormer: A Unified Vision Transformer Model and Pre-Training
Framework for 3D Medical Image Segmentation
- URL: http://arxiv.org/abs/2204.00631v2
- Date: Tue, 5 Apr 2022 16:41:01 GMT
- Title: UNetFormer: A Unified Vision Transformer Model and Pre-Training
Framework for 3D Medical Image Segmentation
- Authors: Ali Hatamizadeh, Ziyue Xu, Dong Yang, Wenqi Li, Holger Roth and
Daguang Xu
- Abstract summary: We introduce a unified framework consisting of two architectures, dubbed UNetFormer, with a 3D Swin Transformer-based encoder and Conal Neural Network (CNN) and transformer-based decoders.
In the proposed model, the encoder is linked to the decoder via skip connections at five different resolutions with deep supervision.
We present a methodology for self-supervised pre-training of the encoder backbone via learning to predict randomly masked tokens.
- Score: 14.873473285148853
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Vision Transformers (ViT)s have recently become popular due to their
outstanding modeling capabilities, in particular for capturing long-range
information, and scalability to dataset and model sizes which has led to
state-of-the-art performance in various computer vision and medical image
analysis tasks. In this work, we introduce a unified framework consisting of
two architectures, dubbed UNetFormer, with a 3D Swin Transformer-based encoder
and Convolutional Neural Network (CNN) and transformer-based decoders. In the
proposed model, the encoder is linked to the decoder via skip connections at
five different resolutions with deep supervision. The design of proposed
architecture allows for meeting a wide range of trade-off requirements between
accuracy and computational cost. In addition, we present a methodology for
self-supervised pre-training of the encoder backbone via learning to predict
randomly masked volumetric tokens using contextual information of visible
tokens. We pre-train our framework on a cohort of $5050$ CT images, gathered
from publicly available CT datasets, and present a systematic investigation of
various components such as masking ratio and patch size that affect the
representation learning capability and performance of downstream tasks. We
validate the effectiveness of our pre-training approach by fine-tuning and
testing our model on liver and liver tumor segmentation task using the Medical
Segmentation Decathlon (MSD) dataset and achieve state-of-the-art performance
in terms of various segmentation metrics. To demonstrate its generalizability,
we train and test the model on BraTS 21 dataset for brain tumor segmentation
using MRI images and outperform other methods in terms of Dice score. Code:
https://github.com/Project-MONAI/research-contributions
Related papers
- MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - Promise:Prompt-driven 3D Medical Image Segmentation Using Pretrained
Image Foundation Models [13.08275555017179]
We propose ProMISe, a prompt-driven 3D medical image segmentation model using only a single point prompt.
We evaluate our model on two public datasets for colon and pancreas tumor segmentations.
arXiv Detail & Related papers (2023-10-30T16:49:03Z) - SeUNet-Trans: A Simple yet Effective UNet-Transformer Model for Medical
Image Segmentation [0.0]
We propose a simple yet effective UNet-Transformer (seUNet-Trans) model for medical image segmentation.
In our approach, the UNet model is designed as a feature extractor to generate multiple feature maps from the input images.
By leveraging the UNet architecture and the self-attention mechanism, our model not only retains the preservation of both local and global context information but also is capable of capturing long-range dependencies between input elements.
arXiv Detail & Related papers (2023-10-16T01:13:38Z) - Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks.
In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics.
Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors
in MRI Images [7.334185314342017]
We propose a novel segmentation model termed Swin UNEt TRansformers (Swin UNETR)
The model extracts features at five different resolutions by utilizing shifted windows for computing self-attention.
We have participated in BraTS 2021 segmentation challenge, and our proposed model ranks among the top-performing approaches in the validation phase.
arXiv Detail & Related papers (2022-01-04T18:01:34Z) - Atrous Residual Interconnected Encoder to Attention Decoder Framework
for Vertebrae Segmentation via 3D Volumetric CT Images [1.8146155083014204]
This paper proposes a novel algorithm for automated vertebrae segmentation via 3D volumetric spine CT images.
The proposed model is based on the structure of encoder to decoder, using layer normalization to optimize mini-batch training performance.
The experimental results show that our model achieves competitive performance compared with other state-of-the-art medical semantic segmentation methods.
arXiv Detail & Related papers (2021-04-08T12:09:16Z) - UNETR: Transformers for 3D Medical Image Segmentation [8.59571749685388]
We introduce a novel architecture, dubbed as UNEt TRansformers (UNETR), that utilizes a pure transformer as the encoder to learn sequence representations of the input volume.
We have extensively validated the performance of our proposed model across different imaging modalities.
arXiv Detail & Related papers (2021-03-18T20:17:15Z) - Medical Transformer: Gated Axial-Attention for Medical Image
Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks.
We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module.
To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.