Self-supervised 3D anatomy segmentation using self-distilled masked
image transformer (SMIT)
- URL: http://arxiv.org/abs/2205.10342v1
- Date: Fri, 20 May 2022 17:55:14 GMT
- Title: Self-supervised 3D anatomy segmentation using self-distilled masked
image transformer (SMIT)
- Authors: Jue Jiang, Neelam Tyagi, Kathryn Tringale, Christopher Crane, Harini
Veeraraghavan
- Abstract summary: Self-supervised learning has demonstrated success in medical image segmentation using convolutional networks.
We show our approach is more accurate and requires fewer fine tuning datasets than other pretext tasks.
- Score: 2.7298989068857487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision transformers, with their ability to more efficiently model long-range
context, have demonstrated impressive accuracy gains in several computer vision
and medical image analysis tasks including segmentation. However, such methods
need large labeled datasets for training, which is hard to obtain for medical
image analysis. Self-supervised learning (SSL) has demonstrated success in
medical image segmentation using convolutional networks. In this work, we
developed a \underline{s}elf-distillation learning with \underline{m}asked
\underline{i}mage modeling method to perform SSL for vision
\underline{t}ransformers (SMIT) applied to 3D multi-organ segmentation from CT
and MRI. Our contribution is a dense pixel-wise regression within masked
patches called masked image prediction, which we combined with masked patch
token distillation as pretext task to pre-train vision transformers. We show
our approach is more accurate and requires fewer fine tuning datasets than
other pretext tasks. Unlike prior medical image methods, which typically used
image sets arising from disease sites and imaging modalities corresponding to
the target tasks, we used 3,643 CT scans (602,708 images) arising from head and
neck, lung, and kidney cancers as well as COVID-19 for pre-training and applied
it to abdominal organs segmentation from MRI pancreatic cancer patients as well
as publicly available 13 different abdominal organs segmentation from CT. Our
method showed clear accuracy improvement (average DSC of 0.875 from MRI and
0.878 from CT) with reduced requirement for fine-tuning datasets over commonly
used pretext tasks. Extensive comparisons against multiple current SSL methods
were done. Code will be made available upon acceptance for publication.
Related papers
- CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios [53.94122089629544]
We introduce CT-GLIP (Grounded Language-Image Pretraining with CT scans), a novel method that constructs organ-level image-text pairs to enhance multimodal contrastive learning.
Our method, trained on a multimodal CT dataset comprising 44,011 organ-level vision-text pairs from 17,702 patients across 104 organs, demonstrates it can identify organs and abnormalities in a zero-shot manner using natural languages.
arXiv Detail & Related papers (2024-04-23T17:59:01Z) - M3BUNet: Mobile Mean Max UNet for Pancreas Segmentation on CT-Scans [25.636974007788986]
We propose M3BUNet, a fusion of MobileNet and U-Net neural networks, equipped with a novel Mean-Max (MM) attention that operates in two stages to gradually segment pancreas CT images.
For the fine segmentation stage, we found that applying a wavelet decomposition filter to create multi-input images enhances pancreas segmentation performance.
Our approach demonstrates a considerable performance improvement, achieving an average Dice Similarity Coefficient (DSC) value of up to 89.53% and an Intersection Over Union (IOU) score of up to 81.16 for the NIH pancreas dataset.
arXiv Detail & Related papers (2024-01-18T23:10:08Z) - MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image
Segmentation [58.53672866662472]
We introduce a modality-agnostic SAM adaptation framework, named as MA-SAM.
Our method roots in the parameter-efficient fine-tuning strategy to update only a small portion of weight increments.
By injecting a series of 3D adapters into the transformer blocks of the image encoder, our method enables the pre-trained 2D backbone to extract third-dimensional information from input data.
arXiv Detail & Related papers (2023-09-16T02:41:53Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - PCRLv2: A Unified Visual Information Preservation Framework for
Self-supervised Pre-training in Medical Image Analysis [56.63327669853693]
We propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics.
We also address the preservation of scale information, a powerful tool in aiding image understanding.
The proposed unified SSL framework surpasses its self-supervised counterparts on various tasks.
arXiv Detail & Related papers (2023-01-02T17:47:27Z) - Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks.
In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics.
Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z) - Self Pre-training with Masked Autoencoders for Medical Image
Classification and Segmentation [37.25161294917211]
Masked Autoencoder (MAE) has been shown to be effective in pre-training Vision Transformers (ViT) for natural image analysis.
We investigate a self pre-training paradigm with MAE for medical image analysis tasks.
arXiv Detail & Related papers (2022-03-10T16:22:38Z) - CyTran: A Cycle-Consistent Transformer with Multi-Level Consistency for
Non-Contrast to Contrast CT Translation [56.622832383316215]
We propose a novel approach to translate unpaired contrast computed tomography (CT) scans to non-contrast CT scans.
Our approach is based on cycle-consistent generative adversarial convolutional transformers, for short, CyTran.
Our empirical results show that CyTran outperforms all competing methods.
arXiv Detail & Related papers (2021-10-12T23:25:03Z) - Unpaired cross-modality educed distillation (CMEDL) applied to CT lung
tumor segmentation [4.409836695738518]
We develop a new crossmodality educed distillation (CMEDL) approach, using unpaired CT and MRI scans.
Our framework uses an end-to-end trained unpaired I2I translation, teacher, and student segmentation networks.
arXiv Detail & Related papers (2021-07-16T15:58:15Z) - Robust Pancreatic Ductal Adenocarcinoma Segmentation with
Multi-Institutional Multi-Phase Partially-Annotated CT Scans [25.889684822655255]
Pancreatic ductal adenocarcinoma (PDAC) segmentation is one of the most challenging tumor segmentation tasks.
Based on a new self-learning framework, we propose to train the PDAC segmentation model using a much larger quantity of patients.
Experiment results show that our proposed method provides an absolute improvement of 6.3% Dice score over the strong baseline of nnUNet trained on annotated images.
arXiv Detail & Related papers (2020-08-24T18:50:30Z) - A$^3$DSegNet: Anatomy-aware artifact disentanglement and segmentation
network for unpaired segmentation, artifact reduction, and modality
translation [18.500206499468902]
CBCT images are of low-quality and artifact-laden due to noise, poor tissue contrast, and the presence of metallic objects.
There exists a wealth of artifact-free, high quality CT images with vertebra annotations.
This motivates us to build a CBCT vertebra segmentation model using unpaired CT images with annotations.
arXiv Detail & Related papers (2020-01-02T06:37:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.