Related papers: GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images

GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images

URL: http://arxiv.org/abs/2503.06073v1
Date: Sat, 08 Mar 2025 05:48:53 GMT
Title: GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images
Authors: Xiang Lan, Feng Wu, Kai He, Qinghao Zhao, Shenda Hong, Mengling Feng,
Abstract summary: We introduce GEM, the first MLLM unifying ECG time series, 12-lead ECG images and text for grounded and clinician-aligned ECG interpretation.<n> GEM enables feature-grounded analysis, evidence-driven reasoning, and a clinician-like diagnostic process through three core innovations.<n>We propose the Grounded ECG task, a clinically motivated benchmark designed to assess the MLLM's capability in grounded ECG understanding.
Score: 43.65650710265957
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While recent multimodal large language models (MLLMs) have advanced automated ECG interpretation, they still face two key limitations: (1) insufficient multimodal synergy between time series signals and visual ECG representations, and (2) limited explainability in linking diagnoses to granular waveform evidence. We introduce GEM, the first MLLM unifying ECG time series, 12-lead ECG images and text for grounded and clinician-aligned ECG interpretation. GEM enables feature-grounded analysis, evidence-driven reasoning, and a clinician-like diagnostic process through three core innovations: a dual-encoder framework extracting complementary time series and image features, cross-modal alignment for effective multimodal understanding, and knowledge-guided instruction generation for generating high-granularity grounding data (ECG-Grounding) linking diagnoses to measurable parameters ($e.g.$, QRS/PR Intervals). Additionally, we propose the Grounded ECG Understanding task, a clinically motivated benchmark designed to comprehensively assess the MLLM's capability in grounded ECG understanding. Experimental results on both existing and our proposed benchmarks show GEM significantly improves predictive performance (CSN $7.4\% \uparrow$), explainability ($22.7\% \uparrow$), and grounding ($24.8\% \uparrow$), making it more suitable for real-world clinical applications. GitHub repository: https://github.com/lanxiang1017/GEM.git

Related papers

MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and Interpretations [12.00096975933262]
Electrocardiogram (ECG) plays a foundational role in modern cardiovascular care, enabling non-invasive diagnosis of arrhythmias, myocardial ischemia, and conduction disorders.<n>Most existing ECG datasets provide only single-modality data or, at most, dual modalities, making it difficult to build models that can understand and integrate diverse ECG information in real-world settings.<n>We introduce MEETI, the first large-scale ECG dataset that synchronizes raw waveform data, high-resolution plotted images, and detailed textual interpretations generated by large language models.
arXiv Detail & Related papers (2025-07-21T05:32:44Z)
From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining [22.214252217020174]
We introduce MELP, a novel Multi-scale ECG-Language Pretraining (MELP) model that fully leverages hierarchical supervision from ECG-text pairs.<n>We evaluate MELP on three public ECG datasets across multiple tasks, including zero-shot ECG classification, linear probing, and transfer learning.
arXiv Detail & Related papers (2025-06-11T07:22:17Z)
Heartcare Suite: Multi-dimensional Understanding of ECG with Raw Multi-lead Signal Modeling [50.58126509704037]
Heartcare Suite is a framework for fine-grained electrocardiogram (ECG) understanding.<n>Heartcare-220K is a high-quality, structured, and comprehensive multimodal ECG dataset.<n>Heartcare-Bench is a benchmark to guide the optimization of Medical Multimodal Large Language Models (Med-MLLMs) in ECG scenarios.
arXiv Detail & Related papers (2025-06-06T07:56:41Z)
Signal, Image, or Symbolic: Exploring the Best Input Representation for Electrocardiogram-Language Models Through a Unified Framework [18.95201514457046]
Large language models (LLMs) have been applied to electrocardiogram (ECG) interpretation.<n>Electrocardiogram-Language Models (ELMs) emulate expert cardiac electrophysiologists by issuing diagnoses, analyzing waveform morphology, identifying contributing factors, and proposing patient-specific action plans.<n>We present the first comprehensive benchmark of these modalities across 6 public datasets and 5 evaluation metrics.
arXiv Detail & Related papers (2025-05-24T19:43:15Z)
xLSTM-ECG: Multi-label ECG Classification via Feature Fusion with xLSTM [14.02717596836022]
We propose xLSTM-ECG, a novel approach for multi-label classification of ECG signals. To the best of our knowledge, this work represents the first design and application of xLSTM modules specifically adapted for multi-label ECG classification.
arXiv Detail & Related papers (2025-04-14T16:12:46Z)
CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information [61.1904164368732]
We propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals. Specifically, CognitionCapturer trains Modality Experts for each modality to extract cross-modal information from the EEG modality. The framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities.
arXiv Detail & Related papers (2024-12-13T16:27:54Z)
GAF-FusionNet: Multimodal ECG Analysis via Gramian Angular Fields and Split Attention [4.673285689826945]
This paper introduces a novel framework for ECG classification that integrates time-series analysis with image-based representation.<n>We evaluate ECG-FusionNet on three diverse datasets: ECG200, ECG5000, and the MIT-BIH Arrhythmia Database.<n>Results demonstrate significant improvements over state-of-the-art methods, with our model achieving 94.5%, 96.9%, and 99.6% accuracy on the respective datasets.
arXiv Detail & Related papers (2024-12-07T07:02:16Z)
ECG-FM: An Open Electrocardiogram Foundation Model [3.611746032873298]
We present ECG-FM, an open foundation model for ECG analysis. ECG-FM adopts a transformer-based architecture and is pretrained on 2.5 million samples. We show how its command of contextual information results in strong performance, rich pretrained embeddings, and reliable interpretability.
arXiv Detail & Related papers (2024-08-09T17:06:49Z)
ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text [14.06147507373525]
This study introduces a new multimodal contrastive pretaining framework that aims to improve the quality and robustness of learned representations of 12-lead ECG signals. Our framework comprises two key components, including Cardio Query Assistant (CQA) and ECG Semantics Integrator(ESI)
arXiv Detail & Related papers (2024-05-26T06:45:39Z)
MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation [41.324530807795256]
Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions. Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation. We propose the Multimodal ECG Instruction Tuning (MEIT) framework, the first attempt to tackle ECG report generation with LLMs and multimodal instructions.
arXiv Detail & Related papers (2024-03-07T23:20:56Z)
DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection [49.196182908826565]
Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment. Current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images. This paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input.
arXiv Detail & Related papers (2023-09-07T13:43:46Z)
Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG) CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure. Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z)
Generalizing electrocardiogram delineation: training convolutional neural networks with synthetic data augmentation [63.51064808536065]
Existing databases for ECG delineation are small, being insufficient in size and in the array of pathological conditions they represent. This article delves has two main contributions. First, a pseudo-synthetic data generation algorithm was developed, based in probabilistically composing ECG traces given "pools" of fundamental segments, as cropped from the original databases, and a set of rules for their arrangement into coherent synthetic traces. Second, two novel segmentation-based loss functions have been developed, which attempt at enforcing the prediction of an exact number of independent structures and at producing closer segmentation boundaries by focusing on a reduced number of samples.
arXiv Detail & Related papers (2021-11-25T10:11:41Z)
ECG-DelNet: Delineation of Ambulatory Electrocardiograms with Mixed Quality Labeling Using Neural Networks [69.25956542388653]
Deep learning (DL) algorithms are gaining weight in academic and industrial settings. We demonstrate DL can be successfully applied to low interpretative tasks by embedding ECG detection and delineation onto a segmentation framework. The model was trained using PhysioNet's QT database, comprised of 105 ambulatory ECG recordings.
arXiv Detail & Related papers (2020-05-11T16:29:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.