Multi-modal Vision Pre-training for Medical Image Analysis
- URL: http://arxiv.org/abs/2410.10604v2
- Date: Fri, 14 Mar 2025 14:32:09 GMT
- Title: Multi-modal Vision Pre-training for Medical Image Analysis
- Authors: Shaohao Rui, Lingzhi Chen, Zhenyu Tang, Lilong Wang, Mianxin Liu, Shaoting Zhang, Xiaosong Wang,
- Abstract summary: Self-supervised learning has greatly facilitated medical image analysis by suppressing the training data requirement for real-world applications.<n>We conduct a novel multi-modal image pre-training with three proxy tasks to facilitate the learning of cross-modality representations and correlations.<n>Our method is reported in comparison to state-of-the-art pre-training methods, with Dice Score improvement of 0.28%-14.47% across six segmentation benchmarks and a consistent accuracy boost of 0.65%-18.07% in four individual image classification tasks.
- Score: 11.569448567735435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning has greatly facilitated medical image analysis by suppressing the training data requirement for real-world applications. Current paradigms predominantly rely on self-supervision within uni-modal image data, thereby neglecting the inter-modal correlations essential for effective learning of cross-modal image representations. This limitation is particularly significant for naturally grouped multi-modal data, e.g., multi-parametric MRI scans for a patient undergoing various functional imaging protocols in the same study. To bridge this gap, we conduct a novel multi-modal image pre-training with three proxy tasks to facilitate the learning of cross-modality representations and correlations using multi-modal brain MRI scans (over 2.4 million images in 16,022 scans of 3,755 patients), i.e., cross-modal image reconstruction, modality-aware contrastive learning, and modality template distillation. To demonstrate the generalizability of our pre-trained model, we conduct extensive experiments on various benchmarks with ten downstream tasks. The superior performance of our method is reported in comparison to state-of-the-art pre-training methods, with Dice Score improvement of 0.28\%-14.47\% across six segmentation benchmarks and a consistent accuracy boost of 0.65\%-18.07\% in four individual image classification tasks.
Related papers
- ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models [103.25208095165486]
Existing practices rely on powerful but costly large language models (LLMs) or multimodal language models (MLMs) to produce instruction data.
We present a programmatic approach that employs scene graphs as symbolic representations of images and human-written programs to systematically synthesize vision-centric instruction data.
Our approach ensures the interpretability and controllability of the data generation process and scales efficiently while maintaining factual accuracy.
arXiv Detail & Related papers (2024-12-09T21:44:02Z) - A Self-Supervised Model for Multi-modal Stroke Risk Prediction [0.1671198589006117]
Predicting stroke risk is a complex challenge that can be enhanced by integrating diverse clinically available data modalities.
This study introduces a self-supervised multimodal framework that combines 3D brain imaging, clinical data, and image-derived features to improve stroke risk prediction prior to onset.
arXiv Detail & Related papers (2024-11-14T22:00:37Z) - Towards a vision foundation model for comprehensive assessment of Cardiac MRI [11.838157772803282]
We introduce a vision foundation model trained for cardiac magnetic resonance imaging (CMR) assessment.
We finetune the model in supervised way for 9 clinical tasks typical to a CMR workflow.
We demonstrate improved accuracy and robustness across all tasks, over a range of available labeled dataset sizes.
arXiv Detail & Related papers (2024-10-02T15:32:01Z) - Autoregressive Sequence Modeling for 3D Medical Image Representation [48.706230961589924]
We introduce a pioneering method for learning 3D medical image representations through an autoregressive sequence pre-training framework.
Our approach various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence.
arXiv Detail & Related papers (2024-09-13T10:19:10Z) - NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation [55.51412454263856]
This paper proposes to directly modulate the generation process of diffusion models using fMRI signals.
By training with about 67,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity.
arXiv Detail & Related papers (2024-03-27T02:42:52Z) - Parkinson's Disease Classification Using Contrastive Graph Cross-View Learning with Multimodal Fusion of SPECT Images and Clinical Features [5.660131312162423]
Parkinson's Disease (PD) affects millions globally, impacting movement.
Prior research utilized deep learning for PD prediction, primarily focusing on medical images, neglecting the data's underlying manifold structure.
This work proposes a multimodal approach encompassing both image and non-image features, leveraging contrastive cross-view graph fusion for PD classification.
arXiv Detail & Related papers (2023-11-25T02:32:46Z) - fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for
Multi-Subject Brain Activity Decoding [54.17776744076334]
We propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training.
Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving brain activity patterns.
Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach.
arXiv Detail & Related papers (2023-11-01T07:24:22Z) - MUSCLE: Multi-task Self-supervised Continual Learning to Pre-train Deep
Models for X-ray Images of Multiple Body Parts [63.30352394004674]
Multi-task Self-super-vised Continual Learning (MUSCLE) is a novel self-supervised pre-training pipeline for medical imaging tasks.
MUSCLE aggregates X-rays collected from multiple body parts for representation learning, and adopts a well-designed continual learning procedure.
We evaluate MUSCLE using 9 real-world X-ray datasets with various tasks, including pneumonia classification, skeletal abnormality classification, lung segmentation, and tuberculosis (TB) detection.
arXiv Detail & Related papers (2023-10-03T12:19:19Z) - Multi-modal Graph Neural Network for Early Diagnosis of Alzheimer's
Disease from sMRI and PET Scans [11.420077093805382]
We propose to use graph neural networks (GNN) that are designed to deal with problems in non-Euclidean domains.
In this study, we demonstrate how brain networks can be created from sMRI or PET images.
We then present a multi-modal GNN framework where each modality has its own branch of GNN and a technique is proposed to combine the multi-modal data.
arXiv Detail & Related papers (2023-07-31T02:04:05Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - GraVIS: Grouping Augmented Views from Independent Sources for
Dermatology Analysis [52.04899592688968]
We propose GraVIS, which is specifically optimized for learning self-supervised features from dermatology images.
GraVIS significantly outperforms its transfer learning and self-supervised learning counterparts in both lesion segmentation and disease classification tasks.
arXiv Detail & Related papers (2023-01-11T11:38:37Z) - Uncertainty-Aware Multi-Parametric Magnetic Resonance Image Information
Fusion for 3D Object Segmentation [12.361668672097753]
We propose an uncertainty-aware multi-parametric MR image feature fusion method to fully exploit the information for enhanced 3D image segmentation.
Our proposed method achieves better segmentation performance when compared to existing models.
arXiv Detail & Related papers (2022-11-16T09:16:52Z) - DIGEST: Deeply supervIsed knowledGE tranSfer neTwork learning for brain
tumor segmentation with incomplete multi-modal MRI scans [16.93394669748461]
Brain tumor segmentation based on multi-modal magnetic resonance imaging (MRI) plays a pivotal role in assisting brain cancer diagnosis, treatment, and postoperative evaluations.
Despite the achieved inspiring performance by existing automatic segmentation methods, multi-modal MRI data are still unavailable in real-world clinical applications.
We propose a Deeply supervIsed knowledGE tranSfer neTwork (DIGEST), which achieves accurate brain tumor segmentation under different modality-missing scenarios.
arXiv Detail & Related papers (2022-11-15T09:01:14Z) - Model-Guided Multi-Contrast Deep Unfolding Network for MRI
Super-resolution Reconstruction [68.80715727288514]
We show how to unfold an iterative MGDUN algorithm into a novel model-guided deep unfolding network by taking the MRI observation matrix.
In this paper, we propose a novel Model-Guided interpretable Deep Unfolding Network (MGDUN) for medical image SR reconstruction.
arXiv Detail & Related papers (2022-09-15T03:58:30Z) - FAST-AID Brain: Fast and Accurate Segmentation Tool using Artificial
Intelligence Developed for Brain [0.8376091455761259]
A novel deep learning method is proposed for fast and accurate segmentation of the human brain into 132 regions.
The proposed model uses an efficient U-Net-like network and benefits from the intersection points of different views and hierarchical relations.
The proposed method can be applied to brain MRI data including skull or any other artifacts without preprocessing the images or a drop in performance.
arXiv Detail & Related papers (2022-08-30T16:06:07Z) - Metadata-enhanced contrastive learning from retinal optical coherence tomography images [7.932410831191909]
We extend conventional contrastive frameworks with a novel metadata-enhanced strategy.
Our approach employs widely available patient metadata to approximate the true set of inter-image contrastive relationships.
Our approach outperforms both standard contrastive methods and a retinal image foundation model in five out of six image-level downstream tasks.
arXiv Detail & Related papers (2022-08-04T08:53:15Z) - Cross-Modality Deep Feature Learning for Brain Tumor Segmentation [158.8192041981564]
This paper proposes a novel cross-modality deep feature learning framework to segment brain tumors from the multi-modality MRI data.
The core idea is to mine rich patterns across the multi-modality data to make up for the insufficient data scale.
Comprehensive experiments are conducted on the BraTS benchmarks, which show that the proposed cross-modality deep feature learning framework can effectively improve the brain tumor segmentation performance.
arXiv Detail & Related papers (2022-01-07T07:46:01Z) - Self-supervised Learning from 100 Million Medical Images [13.958840691105992]
We propose a method for self-supervised learning of rich image features based on contrastive learning and online feature clustering.
We leverage large training datasets of over 100,000,000 medical images of various modalities, including radiography, computed tomography (CT), magnetic resonance (MR) imaging and ultrasonography.
We highlight a number of advantages of this strategy on challenging image assessment problems in radiography, CT and MR.
arXiv Detail & Related papers (2022-01-04T18:27:04Z) - Multi-modal Aggregation Network for Fast MR Imaging [85.25000133194762]
We propose a novel Multi-modal Aggregation Network, named MANet, which is capable of discovering complementary representations from a fully sampled auxiliary modality.
In our MANet, the representations from the fully sampled auxiliary and undersampled target modalities are learned independently through a specific network.
Our MANet follows a hybrid domain learning framework, which allows it to simultaneously recover the frequency signal in the $k$-space domain.
arXiv Detail & Related papers (2021-10-15T13:16:59Z) - Modality Completion via Gaussian Process Prior Variational Autoencoders
for Multi-Modal Glioma Segmentation [75.58395328700821]
We propose a novel model, Multi-modal Gaussian Process Prior Variational Autoencoder (MGP-VAE), to impute one or more missing sub-modalities for a patient scan.
MGP-VAE can leverage the Gaussian Process (GP) prior on the Variational Autoencoder (VAE) to utilize the subjects/patients and sub-modalities correlations.
We show the applicability of MGP-VAE on brain tumor segmentation where either, two, or three of four sub-modalities may be missing.
arXiv Detail & Related papers (2021-07-07T19:06:34Z) - Latent Correlation Representation Learning for Brain Tumor Segmentation
with Missing MRI Modalities [2.867517731896504]
Accurately segmenting brain tumor from MR images is the key to clinical diagnostics and treatment planning.
It's common to miss some imaging modalities in clinical practice.
We present a novel brain tumor segmentation algorithm with missing modalities.
arXiv Detail & Related papers (2021-04-13T14:21:09Z) - A Multi-Stage Attentive Transfer Learning Framework for Improving
COVID-19 Diagnosis [49.3704402041314]
We propose a multi-stage attentive transfer learning framework for improving COVID-19 diagnosis.
Our proposed framework consists of three stages to train accurate diagnosis models through learning knowledge from multiple source tasks and data of different domains.
Importantly, we propose a novel self-supervised learning method to learn multi-scale representations for lung CT images.
arXiv Detail & Related papers (2021-01-14T01:39:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.