CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models
- URL: http://arxiv.org/abs/2503.07667v2
- Date: Thu, 20 Mar 2025 05:05:56 GMT
- Title: CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models
- Authors: Wei Dai, Peilin Chen, Malinda Lu, Daniel Li, Haowen Wei, Hejie Cui, Paul Pu Liang,
- Abstract summary: We introduce Clinical Large-Scale Integrative Multimodal Benchmark ( CLIMB)<n> CLIMB is a comprehensive benchmark unifying diverse clinical data across imaging, language, temporal, and graph modalities.<n>Pretraining on CLIMB effectively improves models' generalization capability to new tasks, and strong unimodal encoder performance translates well to multimodal performance when paired with task-appropriate fusion strategies.
- Score: 27.726366396356763
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recent advances in clinical AI have enabled remarkable progress across many clinical domains. However, existing benchmarks and models are primarily limited to a small set of modalities and tasks, which hinders the development of large-scale multimodal methods that can make holistic assessments of patient health and well-being. To bridge this gap, we introduce Clinical Large-Scale Integrative Multimodal Benchmark (CLIMB), a comprehensive clinical benchmark unifying diverse clinical data across imaging, language, temporal, and graph modalities. CLIMB comprises 4.51 million patient samples totaling 19.01 terabytes distributed across 2D imaging, 3D video, time series, graphs, and multimodal data. Through extensive empirical evaluation, we demonstrate that multitask pretraining significantly improves performance on understudied domains, achieving up to 29% improvement in ultrasound and 23% in ECG analysis over single-task learning. Pretraining on CLIMB also effectively improves models' generalization capability to new tasks, and strong unimodal encoder performance translates well to multimodal performance when paired with task-appropriate fusion strategies. Our findings provide a foundation for new architecture designs and pretraining strategies to advance clinical AI research. Code is released at https://github.com/DDVD233/climb.
Related papers
- Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates.<n>Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information.<n>Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals.<n>Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z) - Efficient MedSAMs: Segment Anything in Medical Images on Laptop [69.28565867103542]
We organized the first international competition dedicated to promptable medical image segmentation.
The top teams developed lightweight segmentation foundation models and implemented an efficient inference pipeline.
The best-performing algorithms have been incorporated into the open-source software with a user-friendly interface to facilitate clinical adoption.
arXiv Detail & Related papers (2024-12-20T17:33:35Z) - Active Data Curation Effectively Distills Large-Scale Multimodal Models [66.23057263509027]
Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones.
In this work we explore an alternative, yet simple approach -- active data curation as effective distillation for contrastive multimodal pretraining.
Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations.
arXiv Detail & Related papers (2024-11-27T18:50:15Z) - Rethinking Model Prototyping through the MedMNIST+ Dataset Collection [0.11999555634662634]
This work presents a benchmark for the MedMNIST+ database to diversify the evaluation landscape.
We conduct a thorough analysis of common convolutional neural networks (CNNs) and Transformer-based architectures, for medical image classification.
Our findings suggest that computationally efficient training schemes and modern foundation models hold promise in bridging the gap between expensive end-to-end training and more resource-refined approaches.
arXiv Detail & Related papers (2024-04-24T10:19:25Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - DiCoM -- Diverse Concept Modeling towards Enhancing Generalizability in Chest X-Ray Studies [6.83819481805979]
Chest X-Ray (CXR) is a widely used clinical imaging modality.
Self-supervised pre-training has proven to outperform supervised pre-training in numerous downstream vision tasks.
We introduce Diverse Concept Modeling (DiCoM), a novel self-supervised training paradigm.
arXiv Detail & Related papers (2024-02-22T20:51:37Z) - OCT-SelfNet: A Self-Supervised Framework with Multi-Modal Datasets for
Generalized and Robust Retinal Disease Detection [2.3349787245442966]
Our research contributes a self-supervised robust machine learning framework, OCT-SelfNet, for detecting eye diseases.
Our method addresses the issue using a two-phase training approach that combines self-supervised pretraining and supervised fine-tuning.
In terms of the AUC-PR metric, our proposed method exceeded 42%, showcasing a substantial increase of at least 10% in performance compared to the baseline.
arXiv Detail & Related papers (2024-01-22T20:17:14Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Competence-based Multimodal Curriculum Learning for Medical Report
Generation [98.10763792453925]
We propose a Competence-based Multimodal Curriculum Learning framework ( CMCL) to alleviate the data bias and make best use of available data.
Specifically, CMCL simulates the learning process of radiologists and optimize the model in a step by step manner.
Experiments on the public IU-Xray and MIMIC-CXR datasets show that CMCL can be incorporated into existing models to improve their performance.
arXiv Detail & Related papers (2022-06-24T08:16:01Z) - Multi-objective optimization determines when, which and how to fuse deep
networks: an application to predict COVID-19 outcomes [1.8351254916713304]
We present a novel approach to optimize the setup of a multimodal end-to-end model.
We test our method on the AIforCOVID dataset, attaining state-of-the-art results.
arXiv Detail & Related papers (2022-04-07T23:07:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.