MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained
on a Large-Scale Unannotated Dataset
- URL: http://arxiv.org/abs/2306.16925v1
- Date: Thu, 29 Jun 2023 13:22:13 GMT
- Title: MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained
on a Large-Scale Unannotated Dataset
- Authors: Guotai Wang, Jianghao Wu, Xiangde Luo, Xinglong Liu, Kang Li, Shaoting
Zhang
- Abstract summary: We propose a novel self-supervised learning strategy named Volume Fusion (VF) for pretraining 3D segmentation models.
VF forces the model to predict the fusion coefficient of each voxel, which is formulated as a self-supervised segmentation task without manual annotations.
experiments with different downstream segmentation targets including head and neck organs, thoracic/abdominal organs showed that our pretrained model largely outperformed training from scratch.
- Score: 14.823114726604853
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretraining with large-scale 3D volumes has a potential for improving the
segmentation performance on a target medical image dataset where the training
images and annotations are limited. Due to the high cost of acquiring
pixel-level segmentation annotations on the large-scale pretraining dataset,
pretraining with unannotated images is highly desirable. In this work, we
propose a novel self-supervised learning strategy named Volume Fusion (VF) for
pretraining 3D segmentation models. It fuses several random patches from a
foreground sub-volume to a background sub-volume based on a predefined set of
discrete fusion coefficients, and forces the model to predict the fusion
coefficient of each voxel, which is formulated as a self-supervised
segmentation task without manual annotations. Additionally, we propose a novel
network architecture based on parallel convolution and transformer blocks that
is suitable to be transferred to different downstream segmentation tasks with
various scales of organs and lesions. The proposed model was pretrained with
110k unannotated 3D CT volumes, and experiments with different downstream
segmentation targets including head and neck organs, thoracic/abdominal organs
showed that our pretrained model largely outperformed training from scratch and
several state-of-the-art self-supervised training methods and segmentation
models. The code and pretrained model are available at
https://github.com/openmedlab/MIS-FM.
Related papers
- FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets.
We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation.
Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z) - MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Promise:Prompt-driven 3D Medical Image Segmentation Using Pretrained
Image Foundation Models [13.08275555017179]
We propose ProMISe, a prompt-driven 3D medical image segmentation model using only a single point prompt.
We evaluate our model on two public datasets for colon and pancreas tumor segmentations.
arXiv Detail & Related papers (2023-10-30T16:49:03Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image
Analysis [7.214195462426705]
We introduce a novel self-supervised learning framework with tailored proxy tasks for medical image analysis.
We demonstrate successful pre-training of the proposed model on 5,050 publicly available computed tomography (CT) images.
Our model is currently the state-of-the-art (i.e. ranked 1st) on the public test leaderboards of both MSD and BTCV datasets.
arXiv Detail & Related papers (2021-11-29T18:45:20Z) - Self-Supervised Generative Style Transfer for One-Shot Medical Image
Segmentation [10.634870214944055]
In medical image segmentation, supervised deep networks' success comes at the cost of requiring abundant labeled data.
We propose a novel volumetric self-supervised learning for data augmentation capable of synthesizing volumetric image-segmentation pairs.
Our work's central tenet benefits from a combined view of one-shot generative learning and the proposed self-supervised training strategy.
arXiv Detail & Related papers (2021-10-05T15:28:42Z) - Bidirectional RNN-based Few Shot Learning for 3D Medical Image
Segmentation [11.873435088539459]
We propose a 3D few shot segmentation framework for accurate organ segmentation using limited training samples of the target organ annotation.
A U-Net like network is designed to predict segmentation by learning the relationship between 2D slices of support data and a query image.
We evaluate our proposed model using three 3D CT datasets with annotations of different organs.
arXiv Detail & Related papers (2020-11-19T01:44:55Z) - Explanation-Guided Training for Cross-Domain Few-Shot Classification [96.12873073444091]
Cross-domain few-shot classification task (CD-FSC) combines few-shot classification with the requirement to generalize across domains represented by datasets.
We introduce a novel training approach for existing FSC models.
We show that explanation-guided training effectively improves the model generalization.
arXiv Detail & Related papers (2020-07-17T07:28:08Z) - Modelling the Distribution of 3D Brain MRI using a 2D Slice VAE [66.63629641650572]
We propose a method to model 3D MR brain volumes distribution by combining a 2D slice VAE with a Gaussian model that captures the relationships between slices.
We also introduce a novel evaluation method for generated volumes that quantifies how well their segmentations match those of true brain anatomy.
arXiv Detail & Related papers (2020-07-09T13:23:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.