MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis
- URL: http://arxiv.org/abs/2404.15580v1
- Date: Wed, 24 Apr 2024 01:14:33 GMT
- Title: MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis
- Authors: Jiaxin Zhuang, Linshan Wu, Qiong Wang, Varut Vardhanabhuti, Lin Luo, Hao Chen,
- Abstract summary: Mask AutoEncoder (MAE) for feature pre-training can unleash the potential of ViT on various medical vision tasks.
We propose a novel textitMask in Mask (MiM) pre-training framework for 3D medical images.
- Score: 9.227314308722047
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Vision Transformer (ViT) has demonstrated remarkable performance in Self-Supervised Learning (SSL) for 3D medical image analysis. Mask AutoEncoder (MAE) for feature pre-training can further unleash the potential of ViT on various medical vision tasks. However, due to large spatial sizes with much higher dimensions of 3D medical images, the lack of hierarchical design for MAE may hinder the performance of downstream tasks. In this paper, we propose a novel \textit{Mask in Mask (MiM)} pre-training framework for 3D medical images, which aims to advance MAE by learning discriminative representation from hierarchical visual tokens across varying scales. We introduce multiple levels of granularity for masked inputs from the volume, which are then reconstructed simultaneously ranging at both fine and coarse levels. Additionally, a cross-level alignment mechanism is applied to adjacent level volumes to enforce anatomical similarity hierarchically. Furthermore, we adopt a hybrid backbone to enhance the hierarchical representation learning efficiently during the pre-training. MiM was pre-trained on a large scale of available 3D volumetric images, \textit{i.e.,} Computed Tomography (CT) images containing various body parts. Extensive experiments on thirteen public datasets demonstrate the superiority of MiM over other SSL methods in organ/lesion/tumor segmentation and disease classification. We further scale up the MiM to large pre-training datasets with more than 10k volumes, showing that large-scale pre-training can further enhance the performance of downstream tasks. The improvement also concluded that the research community should pay more attention to the scale of the pre-training dataset towards the healthcare foundation model for 3D medical images.
Related papers
- Domain Aware Multi-Task Pretraining of 3D Swin Transformer for T1-weighted Brain MRI [4.453300553789746]
We propose novel domain-aware multi-task learning tasks to pretrain a 3D Swin Transformer for brain magnetic resonance imaging (MRI)
Our method considers the domain knowledge in brain MRI by incorporating brain anatomy and morphology as well as standard pretext tasks adapted for 3D imaging in a contrastive learning setting.
Our method outperforms existing supervised and self-supervised methods in three downstream tasks of Alzheimer's disease classification, Parkinson's disease classification, and age prediction tasks.
arXiv Detail & Related papers (2024-10-01T05:21:02Z) - Autoregressive Sequence Modeling for 3D Medical Image Representation [48.706230961589924]
We introduce a pioneering method for learning 3D medical image representations through an autoregressive sequence pre-training framework.
Our approach various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence.
arXiv Detail & Related papers (2024-09-13T10:19:10Z) - Generative Enhancement for 3D Medical Images [74.17066529847546]
We propose GEM-3D, a novel generative approach to the synthesis of 3D medical images.
Our method begins with a 2D slice, noted as the informed slice to serve the patient prior, and propagates the generation process using a 3D segmentation mask.
By decomposing the 3D medical images into masks and patient prior information, GEM-3D offers a flexible yet effective solution for generating versatile 3D images.
arXiv Detail & Related papers (2024-03-19T15:57:04Z) - Enhancing Weakly Supervised 3D Medical Image Segmentation through
Probabilistic-aware Learning [52.249748801637196]
3D medical image segmentation is a challenging task with crucial implications for disease diagnosis and treatment planning.
Recent advances in deep learning have significantly enhanced fully supervised medical image segmentation.
We propose a novel probabilistic-aware weakly supervised learning pipeline, specifically designed for 3D medical imaging.
arXiv Detail & Related papers (2024-03-05T00:46:53Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - HybridMIM: A Hybrid Masked Image Modeling Framework for 3D Medical Image
Segmentation [29.15746532186427]
HybridMIM is a novel hybrid self-supervised learning method based on masked image modeling for 3D medical image segmentation.
We learn the semantic information of medical images at three levels, including:1) partial region prediction to reconstruct key contents of the 3D image, which largely reduces the pre-training time burden.
The proposed framework is versatile to support both CNN and transformer as encoder backbones, and also enables to pre-train decoders for image segmentation.
arXiv Detail & Related papers (2023-03-18T04:43:12Z) - Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks.
In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics.
Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z) - Masked Image Modeling Advances 3D Medical Image Analysis [0.41674286453548476]
Masked image modeling (MIM) has gained considerable attention due to its capacity to learn from vast amounts of unlabeled data.
This paper shows that MIM can also advance 3D medical images analysis in addition to natural images.
arXiv Detail & Related papers (2022-04-25T15:16:08Z) - Medical Transformer: Universal Brain Encoder for 3D MRI Analysis [1.6287500717172143]
Existing 3D-based methods have transferred the pre-trained models to downstream tasks.
They demand a massive amount of parameters to train the model for 3D medical imaging.
We propose a novel transfer learning framework, called Medical Transformer, that effectively models 3D volumetric images in the form of a sequence of 2D image slices.
arXiv Detail & Related papers (2021-04-28T08:34:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.