Teach Multimodal LLMs to Comprehend Electrocardiographic Images
- URL: http://arxiv.org/abs/2410.19008v1
- Date: Mon, 21 Oct 2024 20:26:41 GMT
- Title: Teach Multimodal LLMs to Comprehend Electrocardiographic Images
- Authors: Ruoqi Liu, Yuelin Bai, Xiang Yue, Ping Zhang,
- Abstract summary: We introduce ECGInstruct, a comprehensive ECG image instruction tuning dataset of over one million samples.
We also develop PULSE, an MLLM tailored for ECG image comprehension.
Our experiments show that PULSE sets a new state-of-the-art, outperforming general MLLMs with an average accuracy improvement of 15% to 30%.
- Score: 10.577263066644194
- License:
- Abstract: The electrocardiogram (ECG) is an essential non-invasive diagnostic tool for assessing cardiac conditions. Existing automatic interpretation methods suffer from limited generalizability, focusing on a narrow range of cardiac conditions, and typically depend on raw physiological signals, which may not be readily available in resource-limited settings where only printed or digital ECG images are accessible. Recent advancements in multimodal large language models (MLLMs) present promising opportunities for addressing these challenges. However, the application of MLLMs to ECG image interpretation remains challenging due to the lack of instruction tuning datasets and well-established ECG image benchmarks for quantitative evaluation. To address these challenges, we introduce ECGInstruct, a comprehensive ECG image instruction tuning dataset of over one million samples, covering a wide range of ECG-related tasks from diverse data sources. Using ECGInstruct, we develop PULSE, an MLLM tailored for ECG image comprehension. In addition, we curate ECGBench, a new evaluation benchmark covering four key ECG image interpretation tasks across nine different datasets. Our experiments show that PULSE sets a new state-of-the-art, outperforming general MLLMs with an average accuracy improvement of 15% to 30%. This work highlights the potential of PULSE to enhance ECG interpretation in clinical practice.
Related papers
- Electrocardiogram-Language Model for Few-Shot Question Answering with Meta Learning [19.513904491604794]
Electrocardiogram (ECG) interpretation requires specialized expertise.
This work introduces a novel multimodal meta-learning method for few-shot ECG question answering.
arXiv Detail & Related papers (2024-10-18T13:48:01Z) - Self-supervised inter-intra period-aware ECG representation learning for detecting atrial fibrillation [41.82319894067087]
We propose an inter-intra period-aware ECG representation learning approach.
Considering ECGs of atrial fibrillation patients exhibit the irregularity in RR intervals and the absence of P-waves, we develop specific pre-training tasks for interperiod and intraperiod representations.
Our approach demonstrates remarkable AUC performances on the BTCH dataset, textiti.e., 0.953/0.996 for paroxysmal/persistent atrial fibrillation detection.
arXiv Detail & Related papers (2024-10-08T10:03:52Z) - ECG-FM: An Open Electrocardiogram Foundation Model [3.611746032873298]
We present ECG-FM, an open foundation model for ECG analysis.
ECG-FM adopts a transformer-based architecture and is pretrained on 2.5 million samples.
We show how its command of contextual information results in strong performance, rich pretrained embeddings, and reliable interpretability.
arXiv Detail & Related papers (2024-08-09T17:06:49Z) - VizECGNet: Visual ECG Image Network for Cardiovascular Diseases Classification with Multi-Modal Training and Knowledge Distillation [0.7405975743268344]
In practice, ECG data is stored as either digitized signals or printed images.
We propose VizECGNet, which uses only printed ECG graphics to determine the prognosis of multiple cardiovascular diseases.
arXiv Detail & Related papers (2024-08-06T01:34:43Z) - MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation [41.324530807795256]
Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions.
Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation.
We propose the Multimodal ECG Instruction Tuning (MEIT) framework, the first attempt to tackle ECG report generation with LLMs and multimodal instructions.
arXiv Detail & Related papers (2024-03-07T23:20:56Z) - RAD-DINO: Exploring Scalable Medical Image Encoders Beyond Text
Supervision [44.00149519249467]
Language-supervised pre-training has proven to be a valuable method for extracting semantically meaningful features from images.
We introduce RAD-DINO, a biomedical image encoder pre-trained solely on unimodal biomedical imaging data.
arXiv Detail & Related papers (2024-01-19T17:02:17Z) - ECGBERT: Understanding Hidden Language of ECGs with Self-Supervised
Representation Learning [6.0106590095197605]
ECGBERT is a self-supervised representation learning approach that unlocks the underlying language of ECGs.
We demonstrate ECGBERT's potential to achieve state-of-the-art results on a wide variety of tasks.
arXiv Detail & Related papers (2023-06-10T04:23:08Z) - PulseNet: Deep Learning ECG-signal classification using random
augmentation policy and continous wavelet transform for canines [46.09869227806991]
evaluating canine electrocardiograms (ECG) require skilled veterinarians.
Current availability of veterinary cardiologists for ECG interpretation and diagnostic support is limited.
We implement a deep convolutional neural network (CNN) approach for classifying canine electrocardiogram sequences as either normal or abnormal.
arXiv Detail & Related papers (2023-05-17T09:06:39Z) - Automated Cardiovascular Record Retrieval by Multimodal Learning between
Electrocardiogram and Clinical Report [28.608260758775316]
We introduce a novel approach to ECG interpretation, leveraging recent breakthroughs in Large Language Models (LLMs) and Vision-Transformer (ViT) models.
We propose an alternative method of automatically identifying the most similar clinical cases based on the input ECG data.
Our findings could serve as a crucial resource for providing diagnostic services in underdeveloped regions.
arXiv Detail & Related papers (2023-04-13T06:32:25Z) - Co-Heterogeneous and Adaptive Segmentation from Multi-Source and
Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion
Segmentation [48.504790189796836]
We present a novel segmentation strategy, co-heterogenous and adaptive segmentation (CHASe)
We propose a versatile framework that fuses appearance based semi-supervision, mask based adversarial domain adaptation, and pseudo-labeling.
CHASe can further improve pathological liver mask Dice-Sorensen coefficients by ranges of $4.2% sim 9.4%$.
arXiv Detail & Related papers (2020-05-27T06:58:39Z) - ECG-DelNet: Delineation of Ambulatory Electrocardiograms with Mixed
Quality Labeling Using Neural Networks [69.25956542388653]
Deep learning (DL) algorithms are gaining weight in academic and industrial settings.
We demonstrate DL can be successfully applied to low interpretative tasks by embedding ECG detection and delineation onto a segmentation framework.
The model was trained using PhysioNet's QT database, comprised of 105 ambulatory ECG recordings.
arXiv Detail & Related papers (2020-05-11T16:29:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.