MEDBind: Unifying Language and Multimodal Medical Data Embeddings
- URL: http://arxiv.org/abs/2403.12894v2
- Date: Wed, 20 Mar 2024 04:56:03 GMT
- Title: MEDBind: Unifying Language and Multimodal Medical Data Embeddings
- Authors: Yuan Gao, Sangwook Kim, David E Austin, Chris McIntosh,
- Abstract summary: We present MEDBind (Medical Electronic patient recorD), which learns joint embeddings across CXR, ECG, and medical text.
Using text data as the central anchor, MEDBind features tri-modality binding, delivering competitive performance in top-K retrieval, zero-shot, and few-shot benchmarks.
- Score: 18.954939735299963
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Medical vision-language pretraining models (VLPM) have achieved remarkable progress in fusing chest X-rays (CXR) with clinical texts, introducing image-text data binding approaches that enable zero-shot learning and downstream clinical tasks. However, the current landscape lacks the holistic integration of additional medical modalities, such as electrocardiograms (ECG). We present MEDBind (Medical Electronic patient recorD), which learns joint embeddings across CXR, ECG, and medical text. Using text data as the central anchor, MEDBind features tri-modality binding, delivering competitive performance in top-K retrieval, zero-shot, and few-shot benchmarks against established VLPM, and the ability for CXR-to-ECG zero-shot classification and retrieval. This seamless integration is achieved through combination of contrastive loss on modality-text pairs with our proposed contrastive loss function, Edge-Modality Contrastive Loss, fostering a cohesive embedding space for CXR, ECG, and text. Finally, we demonstrate that MEDBind can improve downstream tasks by directly integrating CXR and ECG embeddings into a large-language model for multimodal prompt tuning.
Related papers
- MoRE: Multi-Modal Contrastive Pre-training with Transformers on X-Rays, ECGs, and Diagnostic Report [4.340464264725625]
We introduce a novel Multi-Modal Contrastive Pre-training Framework that synergistically combines X-rays, electrocardiograms (ECGs) and radiology/cardiology reports.
We utilize LoRA-Peft to significantly reduce trainable parameters in the LLM and incorporate recent linear attention dropping strategy in the Vision Transformer(ViT) for smoother attention.
To the best of our knowledge, we are the first to propose an integrated model that combines X-ray, ECG, and Radiology/Cardiology Report with this approach.
arXiv Detail & Related papers (2024-10-21T17:42:41Z) - C-MELT: Contrastive Enhanced Masked Auto-Encoders for ECG-Language Pre-Training [10.088785685439134]
We propose C-MELT, a framework that pre-trains ECG and text data using a contrastive masked auto-encoder architecture.
C-MELT uniquely combines the strengths of generative with enhanced discriminative capabilities to achieve robust cross-modal representations.
arXiv Detail & Related papers (2024-10-03T01:24:09Z) - Multimodal Variational Autoencoder for Low-cost Cardiac Hemodynamics Instability Detection [8.500041312027596]
We propose a novel variational autoencoder ($textCardioVAE_textX,G$) to integrate low-cost chest X-ray (CXR) and electrocardiogram (ECG) modalities with pre-training.
Our model also excels in producing fine interpretations of predictions directly associated with clinical features.
arXiv Detail & Related papers (2024-03-20T15:06:49Z) - Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement [10.611952462532908]
Multimodal ECG Representation Learning (MERL) is capable of performing zero-shot ECG classification with text prompts.
We propose the Clinical Knowledge Enhanced Prompt Engineering (CKEPE) approach to exploit external expert-verified clinical knowledge databases.
MERL achieves an average AUC score of 75.2% in zero-shot classification (without training data), 3.2% higher than linear probed eSSL methods with 10% annotated training data, averaged across all six datasets.
arXiv Detail & Related papers (2024-03-11T12:28:55Z) - XAI for In-hospital Mortality Prediction via Multimodal ICU Data [57.73357047856416]
We propose an efficient, explainable AI solution for predicting in-hospital mortality via multimodal ICU data.
We employ multimodal learning in our framework, which can receive heterogeneous inputs from clinical data and make decisions.
Our framework can be easily transferred to other clinical tasks, which facilitates the discovery of crucial factors in healthcare research.
arXiv Detail & Related papers (2023-12-29T14:28:04Z) - Radiology Report Generation Using Transformers Conditioned with
Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information.
The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z) - Connecting Multi-modal Contrastive Representations [50.26161419616139]
Multi-modal Contrastive Representation learning aims to encode different modalities into a semantically shared space.
This paper proposes a novel training-efficient method for learning MCR without paired data called Connecting Multi-modal Contrastive Representations (C-MCR)
C-MCR achieves audio-visual state-of-the-art performance on audio-image retrieval, audio-visual source localization, and counterfactual audio-image recognition tasks.
arXiv Detail & Related papers (2023-05-22T09:44:39Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Cross-Modal Causal Intervention for Medical Report Generation [109.83549148448469]
Medical report generation (MRG) is essential for computer-aided diagnosis and medication guidance.
Due to the spurious correlations within image-text data induced by visual and linguistic biases, it is challenging to generate accurate reports reliably describing lesion areas.
We propose a novel Visual-Linguistic Causal Intervention (VLCI) framework for MRG, which consists of a visual deconfounding module (VDM) and a linguistic deconfounding module (LDM)
arXiv Detail & Related papers (2023-03-16T07:23:55Z) - Federated Deep AUC Maximization for Heterogeneous Data with a Constant
Communication Complexity [77.78624443410216]
We propose improved FDAM algorithms for detecting heterogeneous chest data.
A result of this paper is that the communication of the proposed algorithm is strongly independent of the number of machines and also independent of the accuracy level.
Experiments have demonstrated the effectiveness of our FDAM algorithm on benchmark datasets and on medical chest Xray images from different organizations.
arXiv Detail & Related papers (2021-02-09T04:05:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.