Multimodal, Multi-Disease Medical Imaging Foundation Model (MerMED-FM)
- URL: http://arxiv.org/abs/2507.00185v1
- Date: Mon, 30 Jun 2025 18:50:31 GMT
- Title: Multimodal, Multi-Disease Medical Imaging Foundation Model (MerMED-FM)
- Authors: Yang Zhou, Chrystie Wan Ning Quek, Jun Zhou, Yan Wang, Yang Bai, Yuhe Ke, Jie Yao, Laura Gutierrez, Zhen Ling Teo, Darren Shu Jeng Ting, Brian T. Soetikno, Christopher S. Nielsen, Tobias Elze, Zengxiang Li, Linh Le Dinh, Lionel Tim-Ee Cheng, Tran Nguyen Tuan Anh, Chee Leong Cheng, Tien Yin Wong, Nan Liu, Iain Beehuat Tan, Tony Kiat Hon Lim, Rick Siow Mong Goh, Yong Liu, Daniel Shu Wei Ting,
- Abstract summary: We developed MerMED-FM, a state-of-the-art multimodal foundation model trained using self-supervised learning and a memory module.<n>MerMED-FM was trained on 3.3 million medical images from over ten specialties and seven modalities.<n>Strong performance was achieved across all modalities, with AUROCs of 0.988 (skin); 0.982 (pathology); 0.951 (US); 0.943 (CT); 0.931 (CFP); 0.894 (CXR)
- Score: 22.690349928759986
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current artificial intelligence models for medical imaging are predominantly single modality and single disease. Attempts to create multimodal and multi-disease models have resulted in inconsistent clinical accuracy. Furthermore, training these models typically requires large, labour-intensive, well-labelled datasets. We developed MerMED-FM, a state-of-the-art multimodal, multi-specialty foundation model trained using self-supervised learning and a memory module. MerMED-FM was trained on 3.3 million medical images from over ten specialties and seven modalities, including computed tomography (CT), chest X-rays (CXR), ultrasound (US), pathology patches, color fundus photography (CFP), optical coherence tomography (OCT) and dermatology images. MerMED-FM was evaluated across multiple diseases and compared against existing foundational models. Strong performance was achieved across all modalities, with AUROCs of 0.988 (OCT); 0.982 (pathology); 0.951 (US); 0.943 (CT); 0.931 (skin); 0.894 (CFP); 0.858 (CXR). MerMED-FM has the potential to be a highly adaptable, versatile, cross-specialty foundation model that enables robust medical imaging interpretation across diverse medical disciplines.
Related papers
- iMedImage Technical Report [5.0953390013898705]
Chromosome karyotype analysis is crucial for diagnosing hereditary diseases, yet detecting structural abnormalities remains challenging.<n>We developed iMedImage, an end-to-end model for general medical image recognition, demonstrating strong performance across multiple imaging tasks.
arXiv Detail & Related papers (2025-03-27T03:25:28Z) - MedMimic: Physician-Inspired Multimodal Fusion for Early Diagnosis of Fever of Unknown Origin [3.937224424603788]
MedMimic is introduced as a multimodal framework inspired by real-world diagnostic processes.<n>It uses pretrained models such as DINOv2, Vision Transformer, and ResNet-18 to convert high-dimensional 18F-FDG PET/CT imaging into semantically meaningful features.<n>A learnable self-attention-based fusion network integrates these imaging features with clinical data for classification.
arXiv Detail & Related papers (2025-02-07T09:57:03Z) - Vision Foundation Models for Computed Tomography [0.5320113414681007]
Foundation models (FMs) have shown transformative potential in radiology by performing diverse, complex tasks across imaging modalities.<n>Here, we developed CT-FM, a large-scale 3D image-based pre-trained model designed explicitly for various radiological tasks.<n>CT-FM was pre-trained using 148,000 computed tomography (CT) scans from the Imaging Data Commons through label-agnostic contrastive learning.
arXiv Detail & Related papers (2025-01-15T18:30:58Z) - Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis [55.959002385347645]
Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation.<n>We evaluate our method on three public longitudinal benchmark datasets of brain MRI and chest X-rays for counterfactual image generation.
arXiv Detail & Related papers (2024-12-30T01:59:34Z) - MedDiff-FM: A Diffusion-based Foundation Model for Versatile Medical Image Applications [10.321593505248341]
This paper introduces a diffusion-based foundation model to address a diverse range of medical image tasks, namely MedDiff-FM.<n>MedDiff-FM leverages 3D CT images, covering anatomical regions from head to abdomen, to pre-train a diffusion foundation model.<n>The experimental results demonstrate the effectiveness of MedDiff-FM in addressing diverse downstream medical image tasks.
arXiv Detail & Related papers (2024-10-20T16:03:55Z) - FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models [54.09244105445476]
This study introduces a novel knowledge injection approach, FedKIM, to scale the medical foundation model within a federated learning framework.<n>FedKIM leverages lightweight local models to extract healthcare knowledge from private data and integrates this knowledge into a centralized foundation model.<n>Our experiments across twelve tasks in seven modalities demonstrate the effectiveness of FedKIM in various settings.
arXiv Detail & Related papers (2024-08-17T15:42:29Z) - NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation [55.51412454263856]
This paper proposes to directly modulate the generation process of diffusion models using fMRI signals.
By training with about 67,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity.
arXiv Detail & Related papers (2024-03-27T02:42:52Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - PE-MVCNet: Multi-view and Cross-modal Fusion Network for Pulmonary Embolism Prediction [4.659998272408215]
Early detection of a pulmonary embolism (PE) is critical for enhancing patient survival rates.
We suggest a multimodal fusion methodology, termed PE-MVCNet, which capitalizes on Computed Tomography Pulmonary Angiography imaging and EMR data.
Our proposed model outperforms existing methodologies, corroborating that our multimodal fusion model excels compared to models that use a single data modality.
arXiv Detail & Related papers (2024-02-27T03:53:27Z) - Large-scale Long-tailed Disease Diagnosis on Radiology Images [51.453990034460304]
RadDiag is a foundational model supporting 2D and 3D inputs across various modalities and anatomies.
Our dataset, RP3D-DiagDS, contains 40,936 cases with 195,010 scans covering 5,568 disorders.
arXiv Detail & Related papers (2023-12-26T18:20:48Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Cross-Modal Information Maximization for Medical Imaging: CMIM [62.28852442561818]
In hospitals, data are siloed to specific information systems that make the same information available under different modalities.
This offers unique opportunities to obtain and use at train-time those multiple views of the same information that might not always be available at test-time.
We propose an innovative framework that makes the most of available data by learning good representations of a multi-modal input that are resilient to modality dropping at test-time.
arXiv Detail & Related papers (2020-10-20T20:05:35Z) - Universal Model for Multi-Domain Medical Image Retrieval [88.67940265012638]
Medical Image Retrieval (MIR) helps doctors quickly find similar patients' data.
MIR is becoming increasingly helpful due to the wide use of digital imaging modalities.
However, the popularity of various digital imaging modalities in hospitals also poses several challenges to MIR.
arXiv Detail & Related papers (2020-07-14T23:22:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.