Masked Image Modeling Advances 3D Medical Image Analysis
- URL: http://arxiv.org/abs/2204.11716v1
- Date: Mon, 25 Apr 2022 15:16:08 GMT
- Title: Masked Image Modeling Advances 3D Medical Image Analysis
- Authors: Zekai Chen, Devansh Agarwal, Kshitij Aggarwal, Wiem Safta, Mariann
Micsinai Balan, Venkat Sethuraman, Kevin Brown
- Abstract summary: Masked image modeling (MIM) has gained considerable attention due to its capacity to learn from vast amounts of unlabeled data.
This paper shows that MIM can also advance 3D medical images analysis in addition to natural images.
- Score: 0.41674286453548476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, masked image modeling (MIM) has gained considerable attention due
to its capacity to learn from vast amounts of unlabeled data and has been
demonstrated to be effective on a wide variety of vision tasks involving
natural images. Meanwhile, the potential of self-supervised learning in
modeling 3D medical images is anticipated to be immense due to the high
quantities of unlabeled images, and the expense and difficulty of quality
labels. However, MIM's applicability to medical images remains uncertain. In
this paper, we demonstrate that masked image modeling approaches can also
advance 3D medical images analysis in addition to natural images. We study how
masked image modeling strategies leverage performance from the viewpoints of 3D
medical image segmentation as a representative downstream task: i) when
compared to naive contrastive learning, masked image modeling approaches
accelerate the convergence of supervised training even faster (1.40$\times$)
and ultimately produce a higher dice score; ii) predicting raw voxel values
with a high masking ratio and a relatively smaller patch size is non-trivial
self-supervised pretext-task for medical images modeling; iii) a lightweight
decoder or projection head design for reconstruction is powerful for masked
image modeling on 3D medical images which speeds up training and reduce cost;
iv) finally, we also investigate the effectiveness of MIM methods under
different practical scenarios where different image resolutions and labeled
data ratios are applied.
Related papers
- Domain Aware Multi-Task Pretraining of 3D Swin Transformer for T1-weighted Brain MRI [4.453300553789746]
We propose novel domain-aware multi-task learning tasks to pretrain a 3D Swin Transformer for brain magnetic resonance imaging (MRI)
Our method considers the domain knowledge in brain MRI by incorporating brain anatomy and morphology as well as standard pretext tasks adapted for 3D imaging in a contrastive learning setting.
Our method outperforms existing supervised and self-supervised methods in three downstream tasks of Alzheimer's disease classification, Parkinson's disease classification, and age prediction tasks.
arXiv Detail & Related papers (2024-10-01T05:21:02Z) - Autoregressive Sequence Modeling for 3D Medical Image Representation [48.706230961589924]
We introduce a pioneering method for learning 3D medical image representations through an autoregressive sequence pre-training framework.
Our approach various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence.
arXiv Detail & Related papers (2024-09-13T10:19:10Z) - Discriminative Hamiltonian Variational Autoencoder for Accurate Tumor Segmentation in Data-Scarce Regimes [2.8498944632323755]
We propose an end-to-end hybrid architecture for medical image segmentation.
We use Hamiltonian Variational Autoencoders (HVAE) and a discriminative regularization to improve the quality of generated images.
Our architecture operates on a slice-by-slice basis to segment 3D volumes, capitilizing on the richly augmented dataset.
arXiv Detail & Related papers (2024-06-17T15:42:08Z) - MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis [9.227314308722047]
Mask AutoEncoder (MAE) for feature pre-training can unleash the potential of ViT on various medical vision tasks.
We propose a novel textitMask in Mask (MiM) pre-training framework for 3D medical images.
arXiv Detail & Related papers (2024-04-24T01:14:33Z) - M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models [49.5030774873328]
Previous research has primarily focused on 2D medical images, leaving 3D images under-explored, despite their richer spatial information.
We present a large-scale 3D multi-modal medical dataset, M3D-Data, comprising 120K image-text pairs and 662K instruction-response pairs.
We also introduce a new 3D multi-modal medical benchmark, M3D-Bench, which facilitates automatic evaluation across eight tasks.
arXiv Detail & Related papers (2024-03-31T06:55:12Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks.
In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics.
Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z) - Medical Transformer: Universal Brain Encoder for 3D MRI Analysis [1.6287500717172143]
Existing 3D-based methods have transferred the pre-trained models to downstream tasks.
They demand a massive amount of parameters to train the model for 3D medical imaging.
We propose a novel transfer learning framework, called Medical Transformer, that effectively models 3D volumetric images in the form of a sequence of 2D image slices.
arXiv Detail & Related papers (2021-04-28T08:34:21Z) - Generative Adversarial U-Net for Domain-free Medical Image Augmentation [49.72048151146307]
The shortage of annotated medical images is one of the biggest challenges in the field of medical image computing.
In this paper, we develop a novel generative method named generative adversarial U-Net.
Our newly designed model is domain-free and generalizable to various medical images.
arXiv Detail & Related papers (2021-01-12T23:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.