Teach Multimodal LLMs to Comprehend Electrocardiographic Images
- URL: http://arxiv.org/abs/2410.19008v1
- Date: Mon, 21 Oct 2024 20:26:41 GMT
- Title: Teach Multimodal LLMs to Comprehend Electrocardiographic Images
- Authors: Ruoqi Liu, Yuelin Bai, Xiang Yue, Ping Zhang,
- Abstract summary: We introduce ECGInstruct, a comprehensive ECG image instruction tuning dataset of over one million samples.
We also develop PULSE, an MLLM tailored for ECG image comprehension.
Our experiments show that PULSE sets a new state-of-the-art, outperforming general MLLMs with an average accuracy improvement of 15% to 30%.
- Score: 10.577263066644194
- License:
- Abstract: The electrocardiogram (ECG) is an essential non-invasive diagnostic tool for assessing cardiac conditions. Existing automatic interpretation methods suffer from limited generalizability, focusing on a narrow range of cardiac conditions, and typically depend on raw physiological signals, which may not be readily available in resource-limited settings where only printed or digital ECG images are accessible. Recent advancements in multimodal large language models (MLLMs) present promising opportunities for addressing these challenges. However, the application of MLLMs to ECG image interpretation remains challenging due to the lack of instruction tuning datasets and well-established ECG image benchmarks for quantitative evaluation. To address these challenges, we introduce ECGInstruct, a comprehensive ECG image instruction tuning dataset of over one million samples, covering a wide range of ECG-related tasks from diverse data sources. Using ECGInstruct, we develop PULSE, an MLLM tailored for ECG image comprehension. In addition, we curate ECGBench, a new evaluation benchmark covering four key ECG image interpretation tasks across nine different datasets. Our experiments show that PULSE sets a new state-of-the-art, outperforming general MLLMs with an average accuracy improvement of 15% to 30%. This work highlights the potential of PULSE to enhance ECG interpretation in clinical practice.
Related papers
- High-Accuracy ECG Image Interpretation using Parameter-Efficient LoRA Fine-Tuning with Multimodal LLaMA 3.2 [0.0]
This paper explores a practical approach to enhance ECG image interpretation using the multimodal LLaMA 3.2 model.
We used a parameter-efficient fine-tuning strategy, Low-Rank Adaptation (LoRA), specifically designed to boost the model's ability to understand ECG images.
arXiv Detail & Related papers (2025-01-30T17:55:27Z) - CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information [61.1904164368732]
We propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals.
Specifically, CognitionCapturer trains Modality Experts for each modality to extract cross-modal information from the EEG modality.
The framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities.
arXiv Detail & Related papers (2024-12-13T16:27:54Z) - De-biased Multimodal Electrocardiogram Analysis [20.290531515033518]
Multimodal large language models (MLLMs) are increasingly being applied in the medical field.
Previous studies have attempted to address this by converting ECGs into several text tags.
In this work, we directly feed the embeddings of ECGs into the LLM through a projection layer.
arXiv Detail & Related papers (2024-11-22T08:35:35Z) - AnyECG: Foundational Models for Electrocardiogram Analysis [36.53693619144332]
Electrocardiogram (ECG) is highly sensitive in detecting acute heart attacks.
This paper introduces AnyECG, a foundational model designed to extract robust representations from any real-world ECG data.
Experimental results in anomaly detection, arrhythmia detection, corrupted lead generation, and ultra-long ECG signal analysis demonstrate that AnyECG learns common ECG knowledge from data and significantly outperforms cutting-edge methods in each respective task.
arXiv Detail & Related papers (2024-11-17T17:32:58Z) - Electrocardiogram-Language Model for Few-Shot Question Answering with Meta Learning [19.513904491604794]
Electrocardiogram (ECG) interpretation requires specialized expertise.
This work introduces a novel multimodal meta-learning method for few-shot ECG question answering.
arXiv Detail & Related papers (2024-10-18T13:48:01Z) - ECG-FM: An Open Electrocardiogram Foundation Model [3.611746032873298]
We present ECG-FM, an open foundation model for ECG analysis.
ECG-FM adopts a transformer-based architecture and is pretrained on 2.5 million samples.
We show how its command of contextual information results in strong performance, rich pretrained embeddings, and reliable interpretability.
arXiv Detail & Related papers (2024-08-09T17:06:49Z) - MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation [41.324530807795256]
Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions.
Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation.
We propose the Multimodal ECG Instruction Tuning (MEIT) framework, the first attempt to tackle ECG report generation with LLMs and multimodal instructions.
arXiv Detail & Related papers (2024-03-07T23:20:56Z) - Exploring scalable medical image encoders beyond text supervision [42.86944965225041]
Language-supervised pre-training has proven to be a valuable method for extracting semantically meaningful features from images.
We introduce RAD-DINO, a biomedical image encoder pre-trained solely on unimodal biomedical imaging data.
arXiv Detail & Related papers (2024-01-19T17:02:17Z) - Automated Cardiovascular Record Retrieval by Multimodal Learning between
Electrocardiogram and Clinical Report [28.608260758775316]
We introduce a novel approach to ECG interpretation, leveraging recent breakthroughs in Large Language Models (LLMs) and Vision-Transformer (ViT) models.
We propose an alternative method of automatically identifying the most similar clinical cases based on the input ECG data.
Our findings could serve as a crucial resource for providing diagnostic services in underdeveloped regions.
arXiv Detail & Related papers (2023-04-13T06:32:25Z) - Co-Heterogeneous and Adaptive Segmentation from Multi-Source and
Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion
Segmentation [48.504790189796836]
We present a novel segmentation strategy, co-heterogenous and adaptive segmentation (CHASe)
We propose a versatile framework that fuses appearance based semi-supervision, mask based adversarial domain adaptation, and pseudo-labeling.
CHASe can further improve pathological liver mask Dice-Sorensen coefficients by ranges of $4.2% sim 9.4%$.
arXiv Detail & Related papers (2020-05-27T06:58:39Z) - ECG-DelNet: Delineation of Ambulatory Electrocardiograms with Mixed
Quality Labeling Using Neural Networks [69.25956542388653]
Deep learning (DL) algorithms are gaining weight in academic and industrial settings.
We demonstrate DL can be successfully applied to low interpretative tasks by embedding ECG detection and delineation onto a segmentation framework.
The model was trained using PhysioNet's QT database, comprised of 105 ambulatory ECG recordings.
arXiv Detail & Related papers (2020-05-11T16:29:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.