Self Pre-training with Topology- and Spatiality-aware Masked Autoencoders for 3D Medical Image Segmentation
- URL: http://arxiv.org/abs/2406.10519v2
- Date: Mon, 15 Jul 2024 20:35:00 GMT
- Title: Self Pre-training with Topology- and Spatiality-aware Masked Autoencoders for 3D Medical Image Segmentation
- Authors: Pengfei Gu, Yejia Zhang, Huimin Li, Chaoli Wang, Danny Z. Chen,
- Abstract summary: Masked Autoencoders (MAEs) have been shown to be effective in pre-training Vision Transformers (ViTs) for natural and medical image analysis problems.
Existing MAE pre-training methods, which were specifically developed with the ViT architecture, lack the ability to capture geometric shape and spatial information.
We propose a novel extension of known MAEs for self pre-training (i.e., models pre-trained on the same target dataset) for 3D medical image segmentation.
- Score: 16.753957522664713
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Masked Autoencoders (MAEs) have been shown to be effective in pre-training Vision Transformers (ViTs) for natural and medical image analysis problems. By reconstructing missing pixel/voxel information in visible patches, a ViT encoder can aggregate contextual information for downstream tasks. But, existing MAE pre-training methods, which were specifically developed with the ViT architecture, lack the ability to capture geometric shape and spatial information, which is critical for medical image segmentation tasks. In this paper, we propose a novel extension of known MAEs for self pre-training (i.e., models pre-trained on the same target dataset) for 3D medical image segmentation. (1) We propose a new topological loss to preserve geometric shape information by computing topological signatures of both the input and reconstructed volumes, learning geometric shape information. (2) We introduce a pre-text task that predicts the positions of the centers and eight corners of 3D crops, enabling the MAE to aggregate spatial information. (3) We extend the MAE pre-training strategy to a hybrid state-of-the-art (SOTA) medical image segmentation architecture and co-pretrain it alongside the ViT. (4) We develop a fine-tuned model for downstream segmentation tasks by complementing the pre-trained ViT encoder with our pre-trained SOTA model. Extensive experiments on five public 3D segmentation datasets show the effectiveness of our new approach.
Related papers
- Learning General-Purpose Biomedical Volume Representations using Randomized Synthesis [9.355513913682794]
Current biomedical foundation models struggle to generalize as public 3D datasets are small.
We propose a data engine that synthesizes highly variable training samples that enable generalization to new biomedical contexts.
To then train a single 3D network for any voxel-level task, we develop a contrastive learning method that pretrains the network to be stable against nuisance imaging variation simulated by the data engine.
arXiv Detail & Related papers (2024-11-04T18:40:46Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis [9.472502717128556]
Masked AutoEncoder (MAE) for feature pre-training can unleash the potential of ViT on various medical vision tasks.
We propose a novel textitMask in Mask (MiM) pre-training framework for 3D medical images.
arXiv Detail & Related papers (2024-04-24T01:14:33Z) - Primitive Geometry Segment Pre-training for 3D Medical Image
Segmentation [12.251689154843342]
We present the Primitive Geometry Segment Pre-training (PrimGeoSeg) method to enable the learning of 3D semantic features.
PrimGeoSeg performs more accurate and efficient 3D medical image segmentation without manual data collection and annotation.
arXiv Detail & Related papers (2024-01-08T04:37:35Z) - Promise:Prompt-driven 3D Medical Image Segmentation Using Pretrained
Image Foundation Models [13.08275555017179]
We propose ProMISe, a prompt-driven 3D medical image segmentation model using only a single point prompt.
We evaluate our model on two public datasets for colon and pancreas tumor segmentations.
arXiv Detail & Related papers (2023-10-30T16:49:03Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks.
In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics.
Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - UNetFormer: A Unified Vision Transformer Model and Pre-Training
Framework for 3D Medical Image Segmentation [14.873473285148853]
We introduce a unified framework consisting of two architectures, dubbed UNetFormer, with a 3D Swin Transformer-based encoder and Conal Neural Network (CNN) and transformer-based decoders.
In the proposed model, the encoder is linked to the decoder via skip connections at five different resolutions with deep supervision.
We present a methodology for self-supervised pre-training of the encoder backbone via learning to predict randomly masked tokens.
arXiv Detail & Related papers (2022-04-01T17:38:39Z) - Fed-Sim: Federated Simulation for Medical Imaging [131.56325440976207]
We introduce a physics-driven generative approach that consists of two learnable neural modules.
We show that our data synthesis framework improves the downstream segmentation performance on several datasets.
arXiv Detail & Related papers (2020-09-01T19:17:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.