Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image
Analysis
- URL: http://arxiv.org/abs/2111.14791v1
- Date: Mon, 29 Nov 2021 18:45:20 GMT
- Title: Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image
Analysis
- Authors: Yucheng Tang, Dong Yang, Wenqi Li, Holger Roth, Bennett Landman,
Daguang Xu, Vishwesh Nath and Ali Hatamizadeh
- Abstract summary: We introduce a novel self-supervised learning framework with tailored proxy tasks for medical image analysis.
We demonstrate successful pre-training of the proposed model on 5,050 publicly available computed tomography (CT) images.
Our model is currently the state-of-the-art (i.e. ranked 1st) on the public test leaderboards of both MSD and BTCV datasets.
- Score: 7.214195462426705
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Vision Transformers (ViT)s have shown great performance in self-supervised
learning of global and local representations that can be transferred to
downstream applications. Inspired by these results, we introduce a novel
self-supervised learning framework with tailored proxy tasks for medical image
analysis. Specifically, we propose: (i) a new 3D transformer-based model,
dubbed Swin UNEt TRansformers (Swin UNETR), with a hierarchical encoder for
self-supervised pre-training; (ii) tailored proxy tasks for learning the
underlying pattern of human anatomy. We demonstrate successful pre-training of
the proposed model on 5,050 publicly available computed tomography (CT) images
from various body organs. The effectiveness of our approach is validated by
fine-tuning the pre-trained models on the Beyond the Cranial Vault (BTCV)
Segmentation Challenge with 13 abdominal organs and segmentation tasks from the
Medical Segmentation Decathlon (MSD) dataset. Our model is currently the
state-of-the-art (i.e. ranked 1st) on the public test leaderboards of both MSD
and BTCV datasets. Code: https://monai.io/research/swin-unetr
Related papers
- Self Pre-training with Topology- and Spatiality-aware Masked Autoencoders for 3D Medical Image Segmentation [16.753957522664713]
Masked Autoencoders (MAEs) have been shown to be effective in pre-training Vision Transformers (ViTs) for natural and medical image analysis problems.
Existing MAE pre-training methods, which were specifically developed with the ViT architecture, lack the ability to capture geometric shape and spatial information.
We propose a novel extension of known MAEs for self pre-training (i.e., models pre-trained on the same target dataset) for 3D medical image segmentation.
arXiv Detail & Related papers (2024-06-15T06:15:17Z) - Video and Synthetic MRI Pre-training of 3D Vision Architectures for
Neuroimage Analysis [3.208731414009847]
Transfer learning involves pre-training deep learning models on a large corpus of data for adaptation to specific tasks.
We benchmarked vision transformers (ViTs) and convolutional neural networks (CNNs) with varied upstream pre-training approaches.
The resulting pre-trained models can be adapted to a range of downstream tasks, even when training data for the target task is limited.
arXiv Detail & Related papers (2023-09-09T00:33:23Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained
on a Large-Scale Unannotated Dataset [14.823114726604853]
We propose a novel self-supervised learning strategy named Volume Fusion (VF) for pretraining 3D segmentation models.
VF forces the model to predict the fusion coefficient of each voxel, which is formulated as a self-supervised segmentation task without manual annotations.
experiments with different downstream segmentation targets including head and neck organs, thoracic/abdominal organs showed that our pretrained model largely outperformed training from scratch.
arXiv Detail & Related papers (2023-06-29T13:22:13Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - UNetFormer: A Unified Vision Transformer Model and Pre-Training
Framework for 3D Medical Image Segmentation [14.873473285148853]
We introduce a unified framework consisting of two architectures, dubbed UNetFormer, with a 3D Swin Transformer-based encoder and Conal Neural Network (CNN) and transformer-based decoders.
In the proposed model, the encoder is linked to the decoder via skip connections at five different resolutions with deep supervision.
We present a methodology for self-supervised pre-training of the encoder backbone via learning to predict randomly masked tokens.
arXiv Detail & Related papers (2022-04-01T17:38:39Z) - Medical Transformer: Universal Brain Encoder for 3D MRI Analysis [1.6287500717172143]
Existing 3D-based methods have transferred the pre-trained models to downstream tasks.
They demand a massive amount of parameters to train the model for 3D medical imaging.
We propose a novel transfer learning framework, called Medical Transformer, that effectively models 3D volumetric images in the form of a sequence of 2D image slices.
arXiv Detail & Related papers (2021-04-28T08:34:21Z) - A Multi-Stage Attentive Transfer Learning Framework for Improving
COVID-19 Diagnosis [49.3704402041314]
We propose a multi-stage attentive transfer learning framework for improving COVID-19 diagnosis.
Our proposed framework consists of three stages to train accurate diagnosis models through learning knowledge from multiple source tasks and data of different domains.
Importantly, we propose a novel self-supervised learning method to learn multi-scale representations for lung CT images.
arXiv Detail & Related papers (2021-01-14T01:39:19Z) - Modelling the Distribution of 3D Brain MRI using a 2D Slice VAE [66.63629641650572]
We propose a method to model 3D MR brain volumes distribution by combining a 2D slice VAE with a Gaussian model that captures the relationships between slices.
We also introduce a novel evaluation method for generated volumes that quantifies how well their segmentations match those of true brain anatomy.
arXiv Detail & Related papers (2020-07-09T13:23:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.