Related papers: Teach Multimodal LLMs to Comprehend Electrocardiographic Images

Teach Multimodal LLMs to Comprehend Electrocardiographic Images

URL: http://arxiv.org/abs/2410.19008v1
Date: Mon, 21 Oct 2024 20:26:41 GMT
Title: Teach Multimodal LLMs to Comprehend Electrocardiographic Images
Authors: Ruoqi Liu, Yuelin Bai, Xiang Yue, Ping Zhang,
Abstract summary: We introduce ECGInstruct, a comprehensive ECG image instruction tuning dataset of over one million samples. We also develop PULSE, an MLLM tailored for ECG image comprehension. Our experiments show that PULSE sets a new state-of-the-art, outperforming general MLLMs with an average accuracy improvement of 15% to 30%.
Score: 10.577263066644194
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The electrocardiogram (ECG) is an essential non-invasive diagnostic tool for assessing cardiac conditions. Existing automatic interpretation methods suffer from limited generalizability, focusing on a narrow range of cardiac conditions, and typically depend on raw physiological signals, which may not be readily available in resource-limited settings where only printed or digital ECG images are accessible. Recent advancements in multimodal large language models (MLLMs) present promising opportunities for addressing these challenges. However, the application of MLLMs to ECG image interpretation remains challenging due to the lack of instruction tuning datasets and well-established ECG image benchmarks for quantitative evaluation. To address these challenges, we introduce ECGInstruct, a comprehensive ECG image instruction tuning dataset of over one million samples, covering a wide range of ECG-related tasks from diverse data sources. Using ECGInstruct, we develop PULSE, an MLLM tailored for ECG image comprehension. In addition, we curate ECGBench, a new evaluation benchmark covering four key ECG image interpretation tasks across nine different datasets. Our experiments show that PULSE sets a new state-of-the-art, outperforming general MLLMs with an average accuracy improvement of 15% to 30%. This work highlights the potential of PULSE to enhance ECG interpretation in clinical practice.

Related papers

GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images [43.65650710265957]
We introduce GEM, the first MLLM unifying ECG time series, 12-lead ECG images and text for grounded and clinician-aligned ECG interpretation. GEM enables feature-grounded analysis, evidence-driven reasoning, and a clinician-like diagnostic process through three core innovations. We propose the Grounded ECG task, a clinically motivated benchmark designed to assess the MLLM's capability in grounded ECG understanding.
arXiv Detail & Related papers (2025-03-08T05:48:53Z)
Comparing Deep Neural Network for Multi-Label ECG Diagnosis From Scanned ECG [1.2499537119440243]
We evaluate the performance of multiple deep neural network architectures, including AlexNet, VGG, ResNet, and Vision Transformer, on scanned ECG datasets. Our comparative analysis examines model accuracy, robustness to image artifacts, and generalizability across different ECG conditions. The findings highlight the strengths and limitations of each architecture, providing insights into the feasibility of image-based ECG diagnosis.
arXiv Detail & Related papers (2025-02-19T02:56:27Z)
High-Accuracy ECG Image Interpretation using Parameter-Efficient LoRA Fine-Tuning with Multimodal LLaMA 3.2 [0.0]
This paper explores a practical approach to enhance ECG image interpretation using the multimodal LLaMA 3.2 model. We used a parameter-efficient fine-tuning strategy, Low-Rank Adaptation (LoRA), specifically designed to boost the model's ability to understand ECG images.
arXiv Detail & Related papers (2025-01-30T17:55:27Z)
CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information [61.1904164368732]
We propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals. Specifically, CognitionCapturer trains Modality Experts for each modality to extract cross-modal information from the EEG modality. The framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities.
arXiv Detail & Related papers (2024-12-13T16:27:54Z)
De-biased Multimodal Electrocardiogram Analysis [20.290531515033518]
Multimodal large language models (MLLMs) are increasingly being applied in the medical field. Previous studies have attempted to address this by converting ECGs into several text tags. In this work, we directly feed the embeddings of ECGs into the LLM through a projection layer.
arXiv Detail & Related papers (2024-11-22T08:35:35Z)
Electrocardiogram-Language Model for Few-Shot Question Answering with Meta Learning [19.513904491604794]
Electrocardiogram (ECG) interpretation requires specialized expertise. This work introduces a novel multimodal meta-learning method for few-shot ECG question answering.
arXiv Detail & Related papers (2024-10-18T13:48:01Z)
Self-supervised inter-intra period-aware ECG representation learning for detecting atrial fibrillation [41.82319894067087]
We propose an inter-intra period-aware ECG representation learning approach. Considering ECGs of atrial fibrillation patients exhibit the irregularity in RR intervals and the absence of P-waves, we develop specific pre-training tasks for interperiod and intraperiod representations. Our approach demonstrates remarkable AUC performances on the BTCH dataset, textiti.e., 0.953/0.996 for paroxysmal/persistent atrial fibrillation detection.
arXiv Detail & Related papers (2024-10-08T10:03:52Z)
ECG-FM: An Open Electrocardiogram Foundation Model [3.611746032873298]
We present ECG-FM, an open foundation model for ECG analysis. ECG-FM adopts a transformer-based architecture and is pretrained on 2.5 million samples. We show how its command of contextual information results in strong performance, rich pretrained embeddings, and reliable interpretability.
arXiv Detail & Related papers (2024-08-09T17:06:49Z)
VizECGNet: Visual ECG Image Network for Cardiovascular Diseases Classification with Multi-Modal Training and Knowledge Distillation [0.7405975743268344]
In practice, ECG data is stored as either digitized signals or printed images. We propose VizECGNet, which uses only printed ECG graphics to determine the prognosis of multiple cardiovascular diseases.
arXiv Detail & Related papers (2024-08-06T01:34:43Z)
MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation [41.324530807795256]
Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions. Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation. We propose the Multimodal ECG Instruction Tuning (MEIT) framework, the first attempt to tackle ECG report generation with LLMs and multimodal instructions.
arXiv Detail & Related papers (2024-03-07T23:20:56Z)
RAD-DINO: Exploring Scalable Medical Image Encoders Beyond Text Supervision [44.00149519249467]
Language-supervised pre-training has proven to be a valuable method for extracting semantically meaningful features from images. We introduce RAD-DINO, a biomedical image encoder pre-trained solely on unimodal biomedical imaging data.
arXiv Detail & Related papers (2024-01-19T17:02:17Z)
LOTUS: Learning to Optimize Task-based US representations [39.81131738128329]
Anatomical segmentation of organs in ultrasound images is essential to many clinical applications. Existing deep neural networks require a large amount of labeled data for training in order to achieve clinically acceptable performance. In this paper, we propose a novel approach for learning to optimize task-based ultra-sound image representations.
arXiv Detail & Related papers (2023-07-29T16:29:39Z)
Automated Cardiovascular Record Retrieval by Multimodal Learning between Electrocardiogram and Clinical Report [28.608260758775316]
We introduce a novel approach to ECG interpretation, leveraging recent breakthroughs in Large Language Models (LLMs) and Vision-Transformer (ViT) models. We propose an alternative method of automatically identifying the most similar clinical cases based on the input ECG data. Our findings could serve as a crucial resource for providing diagnostic services in underdeveloped regions.
arXiv Detail & Related papers (2023-04-13T06:32:25Z)
Co-Heterogeneous and Adaptive Segmentation from Multi-Source and Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion Segmentation [48.504790189796836]
We present a novel segmentation strategy, co-heterogenous and adaptive segmentation (CHASe) We propose a versatile framework that fuses appearance based semi-supervision, mask based adversarial domain adaptation, and pseudo-labeling. CHASe can further improve pathological liver mask Dice-Sorensen coefficients by ranges of $4.2% sim 9.4%$.
arXiv Detail & Related papers (2020-05-27T06:58:39Z)
ECG-DelNet: Delineation of Ambulatory Electrocardiograms with Mixed Quality Labeling Using Neural Networks [69.25956542388653]
Deep learning (DL) algorithms are gaining weight in academic and industrial settings. We demonstrate DL can be successfully applied to low interpretative tasks by embedding ECG detection and delineation onto a segmentation framework. The model was trained using PhysioNet's QT database, comprised of 105 ambulatory ECG recordings.
arXiv Detail & Related papers (2020-05-11T16:29:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.