Medical Semantic Segmentation with Diffusion Pretrain
- URL: http://arxiv.org/abs/2501.19265v1
- Date: Fri, 31 Jan 2025 16:25:49 GMT
- Title: Medical Semantic Segmentation with Diffusion Pretrain
- Authors: David Li, Anvar Kurmukov, Mikhail Goncharov, Roman Sokolov, Mikhail Belyaev,
- Abstract summary: Recent advances in deep learning have shown that learning robust feature representations is critical for the success of many computer vision tasks.
We propose a novel pretraining strategy using diffusion models with anatomical guidance, tailored to the intricacies of 3D medical image data.
We employ an additional model that predicts 3D universal body-part coordinates, providing guidance during the diffusion process.
- Score: 1.9415817267757087
- License:
- Abstract: Recent advances in deep learning have shown that learning robust feature representations is critical for the success of many computer vision tasks, including medical image segmentation. In particular, both transformer and convolutional-based architectures have benefit from leveraging pretext tasks for pretraining. However, the adoption of pretext tasks in 3D medical imaging has been less explored and remains a challenge, especially in the context of learning generalizable feature representations. We propose a novel pretraining strategy using diffusion models with anatomical guidance, tailored to the intricacies of 3D medical image data. We introduce an auxiliary diffusion process to pretrain a model that produce generalizable feature representations, useful for a variety of downstream segmentation tasks. We employ an additional model that predicts 3D universal body-part coordinates, providing guidance during the diffusion process and improving spatial awareness in generated representations. This approach not only aids in resolving localization inaccuracies but also enriches the model's ability to understand complex anatomical structures. Empirical validation on a 13-class organ segmentation task demonstrate the effectiveness of our pretraining technique. It surpasses existing restorative pretraining methods in 3D medical image segmentation by $7.5\%$, and is competitive with the state-of-the-art contrastive pretraining approach, achieving an average Dice coefficient of 67.8 in a non-linear evaluation scenario.
Related papers
- Self-adaptive vision-language model for 3D segmentation of pulmonary artery and vein [18.696258519327095]
This paper proposes a novel framework called Language-guided self-adaptive Cross-Attention Fusion Framework.
Our method adopts pre-trained CLIP as a strong feature extractor for generating the segmentation of 3D CT scans.
We extensively validate our method on a local dataset, which is the largest pulmonary artery-vein CT dataset to date.
arXiv Detail & Related papers (2025-01-07T12:03:02Z) - Enhancing Weakly Supervised 3D Medical Image Segmentation through
Probabilistic-aware Learning [52.249748801637196]
3D medical image segmentation is a challenging task with crucial implications for disease diagnosis and treatment planning.
Recent advances in deep learning have significantly enhanced fully supervised medical image segmentation.
We propose a novel probabilistic-aware weakly supervised learning pipeline, specifically designed for 3D medical imaging.
arXiv Detail & Related papers (2024-03-05T00:46:53Z) - MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation [25.74088298769155]
We propose a universal training framework called MedContext for 3D medical segmentation.
Our approach effectively learns self supervised contextual cues jointly with the supervised voxel segmentation task.
The effectiveness of MedContext is validated across multiple 3D medical datasets and four state-of-the-art model architectures.
arXiv Detail & Related papers (2024-02-27T17:58:05Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - Forward-Forward Contrastive Learning [4.465144120325802]
We propose Forward Forward Contrastive Learning (FFCL) as a novel pretraining approach for medical image classification.
FFCL achieves superior performance (3.69% accuracy over ImageNet pretrained ResNet-18) over existing pretraining models in the pneumonia classification task.
arXiv Detail & Related papers (2023-05-04T15:29:06Z) - Self Context and Shape Prior for Sensorless Freehand 3D Ultrasound
Reconstruction [61.62191904755521]
3D freehand US reconstruction is promising in addressing the problem by providing broad range and freeform scan.
Existing deep learning based methods only focus on the basic cases of skill sequences.
We propose a novel approach to sensorless freehand 3D US reconstruction considering the complex skill sequences.
arXiv Detail & Related papers (2021-07-31T16:06:50Z) - On the Robustness of Pretraining and Self-Supervision for a Deep
Learning-based Analysis of Diabetic Retinopathy [70.71457102672545]
We compare the impact of different training procedures for diabetic retinopathy grading.
We investigate different aspects such as quantitative performance, statistics of the learned feature representations, interpretability and robustness to image distortions.
Our results indicate that models from ImageNet pretraining report a significant increase in performance, generalization and robustness to image distortions.
arXiv Detail & Related papers (2021-06-25T08:32:45Z) - Few-shot Medical Image Segmentation using a Global Correlation Network
with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation.
We construct our few-shot image segmentor using a deep convolutional network trained episodically.
We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z) - Learning to Segment Anatomical Structures Accurately from One Exemplar [34.287877547953194]
Methods that permit to produce accurate anatomical structure segmentation without using a large amount of fully annotated training images are highly desirable.
We propose Contour Transformer Network (CTN), a one-shot anatomy segmentor including a naturally built-in human-in-the-loop mechanism.
We demonstrate that our one-shot learning method significantly outperforms non-learning-based methods and performs competitively to the state-of-the-art fully supervised deep learning approaches.
arXiv Detail & Related papers (2020-07-06T20:27:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.