HybridMIM: A Hybrid Masked Image Modeling Framework for 3D Medical Image
Segmentation
- URL: http://arxiv.org/abs/2303.10333v1
- Date: Sat, 18 Mar 2023 04:43:12 GMT
- Title: HybridMIM: A Hybrid Masked Image Modeling Framework for 3D Medical Image
Segmentation
- Authors: Zhaohu Xing, Lei Zhu, Lequan Yu, Zhiheng Xing, Liang Wan
- Abstract summary: HybridMIM is a novel hybrid self-supervised learning method based on masked image modeling for 3D medical image segmentation.
We learn the semantic information of medical images at three levels, including:1) partial region prediction to reconstruct key contents of the 3D image, which largely reduces the pre-training time burden.
The proposed framework is versatile to support both CNN and transformer as encoder backbones, and also enables to pre-train decoders for image segmentation.
- Score: 29.15746532186427
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Masked image modeling (MIM) with transformer backbones has recently been
exploited as a powerful self-supervised pre-training technique. The existing
MIM methods adopt the strategy to mask random patches of the image and
reconstruct the missing pixels, which only considers semantic information at a
lower level, and causes a long pre-training time.This paper presents HybridMIM,
a novel hybrid self-supervised learning method based on masked image modeling
for 3D medical image segmentation.Specifically, we design a two-level masking
hierarchy to specify which and how patches in sub-volumes are masked,
effectively providing the constraints of higher level semantic information.
Then we learn the semantic information of medical images at three levels,
including:1) partial region prediction to reconstruct key contents of the 3D
image, which largely reduces the pre-training time burden (pixel-level); 2)
patch-masking perception to learn the spatial relationship between the patches
in each sub-volume (region-level).and 3) drop-out-based contrastive learning
between samples within a mini-batch, which further improves the generalization
ability of the framework (sample-level). The proposed framework is versatile to
support both CNN and transformer as encoder backbones, and also enables to
pre-train decoders for image segmentation. We conduct comprehensive experiments
on four widely-used public medical image segmentation datasets, including
BraTS2020, BTCV, MSD Liver, and MSD Spleen. The experimental results show the
clear superiority of HybridMIM against competing supervised methods, masked
pre-training approaches, and other self-supervised methods, in terms of
quantitative metrics, timing performance and qualitative observations. The
codes of HybridMIM are available at https://github.com/ge-xing/HybridMIM
Related papers
- HySparK: Hybrid Sparse Masking for Large Scale Medical Image Pre-Training [21.444098313697044]
We propose a generative pre-training strategy based on masked image modeling and apply it to large-scale pre-training on medical images.
We employ a simple hierarchical decoder with skip-connections to achieve dense multi-scale feature reconstruction.
arXiv Detail & Related papers (2024-08-11T16:31:39Z) - Enhanced Self-supervised Learning for Multi-modality MRI Segmentation and Classification: A Novel Approach Avoiding Model Collapse [6.3467517115551875]
Multi-modality magnetic resonance imaging (MRI) can provide complementary information for computer-aided diagnosis.
Traditional deep learning algorithms are suitable for identifying specific anatomical structures segmenting lesions and classifying diseases with magnetic resonance images.
Self-supervised learning (SSL) can effectively learn feature representations from unlabeled data by pre-training and is demonstrated to be effective in natural image analysis.
Most SSL methods ignore the similarity of multi-modality MRI, leading to model collapse.
We establish and validate a multi-modality MRI masked autoencoder consisting of hybrid mask pattern (HMP) and pyramid barlow twin (PBT
arXiv Detail & Related papers (2024-07-15T01:11:30Z) - MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis [9.227314308722047]
Mask AutoEncoder (MAE) for feature pre-training can unleash the potential of ViT on various medical vision tasks.
We propose a novel textitMask in Mask (MiM) pre-training framework for 3D medical images.
arXiv Detail & Related papers (2024-04-24T01:14:33Z) - SM2C: Boost the Semi-supervised Segmentation for Medical Image by using Meta Pseudo Labels and Mixed Images [13.971120210536995]
We introduce Scaling-up Mix with Multi-Class (SM2C) to improve the ability to learn semantic features within medical images.
By diversifying the shape of the segmentation objects and enriching the semantic information within each sample, the SM2C demonstrates its potential.
The proposed framework shows significant improvements over state-of-the-art counterparts.
arXiv Detail & Related papers (2024-03-24T04:39:40Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z) - PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling [83.67628239775878]
Masked Image Modeling (MIM) has achieved promising progress with the advent of Masked Autoencoders (MAE) and BEiT.
This paper undertakes a fundamental analysis of MIM from the perspective of pixel reconstruction.
We propose a remarkably simple and effective method, ourmethod, that entails two strategies.
arXiv Detail & Related papers (2023-03-04T13:38:51Z) - PCRLv2: A Unified Visual Information Preservation Framework for
Self-supervised Pre-training in Medical Image Analysis [56.63327669853693]
We propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics.
We also address the preservation of scale information, a powerful tool in aiding image understanding.
The proposed unified SSL framework surpasses its self-supervised counterparts on various tasks.
arXiv Detail & Related papers (2023-01-02T17:47:27Z) - Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks.
In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics.
Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.