Related papers: BrainMVP: Multi-modal Vision Pre-training for Brain Image Analysis using Multi-parametric MRI

BrainMVP: Multi-modal Vision Pre-training for Brain Image Analysis using Multi-parametric MRI

URL: http://arxiv.org/abs/2410.10604v1
Date: Mon, 14 Oct 2024 15:12:16 GMT
Title: BrainMVP: Multi-modal Vision Pre-training for Brain Image Analysis using Multi-parametric MRI
Authors: Shaohao Rui, Lingzhi Chen, Zhenyu Tang, Lilong Wang, Mianxin Liu, Shaoting Zhang, Xiaosong Wang,
Abstract summary: BrainMVP is a multi-modal vision pre-training framework for brain image analysis using multi-parametric MRI scans. Cross-modal reconstruction is explored to learn distinctive brain image embeddings and efficient modality fusion capabilities. Experiments on downstream tasks demonstrate superior performance compared to state-of-the-art pre-training methods in the medical domain.
Score: 11.569448567735435
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate diagnosis of brain abnormalities is greatly enhanced by the inclusion of complementary multi-parametric MRI imaging data. There is significant potential to develop a universal pre-training model that can be quickly adapted for image modalities and various clinical scenarios. However, current models often rely on uni-modal image data, neglecting the cross-modal correlations among different image modalities or struggling to scale up pre-training in the presence of missing modality data. In this paper, we propose BrainMVP, a multi-modal vision pre-training framework for brain image analysis using multi-parametric MRI scans. First, we collect 16,022 brain MRI scans (over 2.4 million images), encompassing eight MRI modalities sourced from a diverse range of centers and devices. Then, a novel pre-training paradigm is proposed for the multi-modal MRI data, addressing the issue of missing modalities and achieving multi-modal information fusion. Cross-modal reconstruction is explored to learn distinctive brain image embeddings and efficient modality fusion capabilities. A modality-wise data distillation module is proposed to extract the essence representation of each MR image modality for both the pre-training and downstream application purposes. Furthermore, we introduce a modality-aware contrastive learning module to enhance the cross-modality association within a study. Extensive experiments on downstream tasks demonstrate superior performance compared to state-of-the-art pre-training methods in the medical domain, with Dice Score improvement of 0.28%-14.47% across six segmentation benchmarks and a consistent accuracy improvement of 0.65%-18.07% in four individual classification tasks.

Related papers

ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models [103.25208095165486]
Existing practices rely on powerful but costly large language models (LLMs) or multimodal language models (MLMs) to produce instruction data. We present a programmatic approach that employs scene graphs as symbolic representations of images and human-written programs to systematically synthesize vision-centric instruction data. Our approach ensures the interpretability and controllability of the data generation process and scales efficiently while maintaining factual accuracy.
arXiv Detail & Related papers (2024-12-09T21:44:02Z)
A Self-Supervised Model for Multi-modal Stroke Risk Prediction [0.1671198589006117]
Predicting stroke risk is a complex challenge that can be enhanced by integrating diverse clinically available data modalities. This study introduces a self-supervised multimodal framework that combines 3D brain imaging, clinical data, and image-derived features to improve stroke risk prediction prior to onset.
arXiv Detail & Related papers (2024-11-14T22:00:37Z)
Towards a vision foundation model for comprehensive assessment of Cardiac MRI [11.838157772803282]
We introduce a vision foundation model trained for cardiac magnetic resonance imaging (CMR) assessment. We finetune the model in supervised way for 9 clinical tasks typical to a CMR workflow. We demonstrate improved accuracy and robustness across all tasks, over a range of available labeled dataset sizes.
arXiv Detail & Related papers (2024-10-02T15:32:01Z)
Autoregressive Sequence Modeling for 3D Medical Image Representation [48.706230961589924]
We introduce a pioneering method for learning 3D medical image representations through an autoregressive sequence pre-training framework. Our approach various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence.
arXiv Detail & Related papers (2024-09-13T10:19:10Z)
NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation [55.51412454263856]
This paper proposes to directly modulate the generation process of diffusion models using fMRI signals. By training with about 67,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity.
arXiv Detail & Related papers (2024-03-27T02:42:52Z)
Parkinson's Disease Classification Using Contrastive Graph Cross-View Learning with Multimodal Fusion of SPECT Images and Clinical Features [5.660131312162423]
Parkinson's Disease (PD) affects millions globally, impacting movement. Prior research utilized deep learning for PD prediction, primarily focusing on medical images, neglecting the data's underlying manifold structure. This work proposes a multimodal approach encompassing both image and non-image features, leveraging contrastive cross-view graph fusion for PD classification.
arXiv Detail & Related papers (2023-11-25T02:32:46Z)
fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for Multi-Subject Brain Activity Decoding [54.17776744076334]
We propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training. Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving brain activity patterns. Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach.
arXiv Detail & Related papers (2023-11-01T07:24:22Z)
MUSCLE: Multi-task Self-supervised Continual Learning to Pre-train Deep Models for X-ray Images of Multiple Body Parts [63.30352394004674]
Multi-task Self-super-vised Continual Learning (MUSCLE) is a novel self-supervised pre-training pipeline for medical imaging tasks. MUSCLE aggregates X-rays collected from multiple body parts for representation learning, and adopts a well-designed continual learning procedure. We evaluate MUSCLE using 9 real-world X-ray datasets with various tasks, including pneumonia classification, skeletal abnormality classification, lung segmentation, and tuberculosis (TB) detection.
arXiv Detail & Related papers (2023-10-03T12:19:19Z)
Multi-modal Graph Neural Network for Early Diagnosis of Alzheimer's Disease from sMRI and PET Scans [11.420077093805382]
We propose to use graph neural networks (GNN) that are designed to deal with problems in non-Euclidean domains. In this study, we demonstrate how brain networks can be created from sMRI or PET images. We then present a multi-modal GNN framework where each modality has its own branch of GNN and a technique is proposed to combine the multi-modal data.
arXiv Detail & Related papers (2023-07-31T02:04:05Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
GraVIS: Grouping Augmented Views from Independent Sources for Dermatology Analysis [52.04899592688968]
We propose GraVIS, which is specifically optimized for learning self-supervised features from dermatology images. GraVIS significantly outperforms its transfer learning and self-supervised learning counterparts in both lesion segmentation and disease classification tasks.
arXiv Detail & Related papers (2023-01-11T11:38:37Z)
Uncertainty-Aware Multi-Parametric Magnetic Resonance Image Information Fusion for 3D Object Segmentation [12.361668672097753]
We propose an uncertainty-aware multi-parametric MR image feature fusion method to fully exploit the information for enhanced 3D image segmentation. Our proposed method achieves better segmentation performance when compared to existing models.
arXiv Detail & Related papers (2022-11-16T09:16:52Z)
DIGEST: Deeply supervIsed knowledGE tranSfer neTwork learning for brain tumor segmentation with incomplete multi-modal MRI scans [16.93394669748461]
Brain tumor segmentation based on multi-modal magnetic resonance imaging (MRI) plays a pivotal role in assisting brain cancer diagnosis, treatment, and postoperative evaluations. Despite the achieved inspiring performance by existing automatic segmentation methods, multi-modal MRI data are still unavailable in real-world clinical applications. We propose a Deeply supervIsed knowledGE tranSfer neTwork (DIGEST), which achieves accurate brain tumor segmentation under different modality-missing scenarios.
arXiv Detail & Related papers (2022-11-15T09:01:14Z)
Model-Guided Multi-Contrast Deep Unfolding Network for MRI Super-resolution Reconstruction [68.80715727288514]
We show how to unfold an iterative MGDUN algorithm into a novel model-guided deep unfolding network by taking the MRI observation matrix. In this paper, we propose a novel Model-Guided interpretable Deep Unfolding Network (MGDUN) for medical image SR reconstruction.
arXiv Detail & Related papers (2022-09-15T03:58:30Z)
FAST-AID Brain: Fast and Accurate Segmentation Tool using Artificial Intelligence Developed for Brain [0.8376091455761259]
A novel deep learning method is proposed for fast and accurate segmentation of the human brain into 132 regions. The proposed model uses an efficient U-Net-like network and benefits from the intersection points of different views and hierarchical relations. The proposed method can be applied to brain MRI data including skull or any other artifacts without preprocessing the images or a drop in performance.
arXiv Detail & Related papers (2022-08-30T16:06:07Z)
Metadata-enhanced contrastive learning from retinal optical coherence tomography images [7.932410831191909]
We extend conventional contrastive frameworks with a novel metadata-enhanced strategy. Our approach employs widely available patient metadata to approximate the true set of inter-image contrastive relationships. Our approach outperforms both standard contrastive methods and a retinal image foundation model in five out of six image-level downstream tasks.
arXiv Detail & Related papers (2022-08-04T08:53:15Z)
Cross-Modality Deep Feature Learning for Brain Tumor Segmentation [158.8192041981564]
This paper proposes a novel cross-modality deep feature learning framework to segment brain tumors from the multi-modality MRI data. The core idea is to mine rich patterns across the multi-modality data to make up for the insufficient data scale. Comprehensive experiments are conducted on the BraTS benchmarks, which show that the proposed cross-modality deep feature learning framework can effectively improve the brain tumor segmentation performance.
arXiv Detail & Related papers (2022-01-07T07:46:01Z)
Self-supervised Learning from 100 Million Medical Images [13.958840691105992]
We propose a method for self-supervised learning of rich image features based on contrastive learning and online feature clustering. We leverage large training datasets of over 100,000,000 medical images of various modalities, including radiography, computed tomography (CT), magnetic resonance (MR) imaging and ultrasonography. We highlight a number of advantages of this strategy on challenging image assessment problems in radiography, CT and MR.
arXiv Detail & Related papers (2022-01-04T18:27:04Z)
Multi-modal Aggregation Network for Fast MR Imaging [85.25000133194762]
We propose a novel Multi-modal Aggregation Network, named MANet, which is capable of discovering complementary representations from a fully sampled auxiliary modality. In our MANet, the representations from the fully sampled auxiliary and undersampled target modalities are learned independently through a specific network. Our MANet follows a hybrid domain learning framework, which allows it to simultaneously recover the frequency signal in the $k$-space domain.
arXiv Detail & Related papers (2021-10-15T13:16:59Z)
Modality Completion via Gaussian Process Prior Variational Autoencoders for Multi-Modal Glioma Segmentation [75.58395328700821]
We propose a novel model, Multi-modal Gaussian Process Prior Variational Autoencoder (MGP-VAE), to impute one or more missing sub-modalities for a patient scan. MGP-VAE can leverage the Gaussian Process (GP) prior on the Variational Autoencoder (VAE) to utilize the subjects/patients and sub-modalities correlations. We show the applicability of MGP-VAE on brain tumor segmentation where either, two, or three of four sub-modalities may be missing.
arXiv Detail & Related papers (2021-07-07T19:06:34Z)
Latent Correlation Representation Learning for Brain Tumor Segmentation with Missing MRI Modalities [2.867517731896504]
Accurately segmenting brain tumor from MR images is the key to clinical diagnostics and treatment planning. It's common to miss some imaging modalities in clinical practice. We present a novel brain tumor segmentation algorithm with missing modalities.
arXiv Detail & Related papers (2021-04-13T14:21:09Z)
A Multi-Stage Attentive Transfer Learning Framework for Improving COVID-19 Diagnosis [49.3704402041314]
We propose a multi-stage attentive transfer learning framework for improving COVID-19 diagnosis. Our proposed framework consists of three stages to train accurate diagnosis models through learning knowledge from multiple source tasks and data of different domains. Importantly, we propose a novel self-supervised learning method to learn multi-scale representations for lung CT images.
arXiv Detail & Related papers (2021-01-14T01:39:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.