Related papers: Signal, Image, or Symbolic: Exploring the Best Input Representation for Electrocardiogram-Language Models Through a Unified Framework

Signal, Image, or Symbolic: Exploring the Best Input Representation for Electrocardiogram-Language Models Through a Unified Framework

URL: http://arxiv.org/abs/2505.18847v1
Date: Sat, 24 May 2025 19:43:15 GMT
Title: Signal, Image, or Symbolic: Exploring the Best Input Representation for Electrocardiogram-Language Models Through a Unified Framework
Authors: William Han, Chaojing Duan, Zhepeng Cen, Yihang Yao, Xiaoyu Song, Atharva Mhaskar, Dylan Leong, Michael A. Rosenberg, Emerson Liu, Ding Zhao,
Abstract summary: Large language models (LLMs) have been applied to electrocardiogram (ECG) interpretation.<n>Electrocardiogram-Language Models (ELMs) emulate expert cardiac electrophysiologists by issuing diagnoses, analyzing waveform morphology, identifying contributing factors, and proposing patient-specific action plans.<n>We present the first comprehensive benchmark of these modalities across 6 public datasets and 5 evaluation metrics.
Score: 18.95201514457046
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances have increasingly applied large language models (LLMs) to electrocardiogram (ECG) interpretation, giving rise to Electrocardiogram-Language Models (ELMs). Conditioned on an ECG and a textual query, an ELM autoregressively generates a free-form textual response. Unlike traditional classification-based systems, ELMs emulate expert cardiac electrophysiologists by issuing diagnoses, analyzing waveform morphology, identifying contributing factors, and proposing patient-specific action plans. To realize this potential, researchers are curating instruction-tuning datasets that pair ECGs with textual dialogues and are training ELMs on these resources. Yet before scaling ELMs further, there is a fundamental question yet to be explored: What is the most effective ECG input representation? In recent works, three candidate representations have emerged-raw time-series signals, rendered images, and discretized symbolic sequences. We present the first comprehensive benchmark of these modalities across 6 public datasets and 5 evaluation metrics. We find symbolic representations achieve the greatest number of statistically significant wins over both signal and image inputs. We further ablate the LLM backbone, ECG duration, and token budget, and we evaluate robustness to signal perturbations. We hope that our findings offer clear guidance for selecting input representations when developing the next generation of ELMs.

Related papers

MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and Interpretations [12.00096975933262]
Electrocardiogram (ECG) plays a foundational role in modern cardiovascular care, enabling non-invasive diagnosis of arrhythmias, myocardial ischemia, and conduction disorders.<n>Most existing ECG datasets provide only single-modality data or, at most, dual modalities, making it difficult to build models that can understand and integrate diverse ECG information in real-world settings.<n>We introduce MEETI, the first large-scale ECG dataset that synchronizes raw waveform data, high-resolution plotted images, and detailed textual interpretations generated by large language models.
arXiv Detail & Related papers (2025-07-21T05:32:44Z)
From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining [22.214252217020174]
We introduce MELP, a novel Multi-scale ECG-Language Pretraining (MELP) model that fully leverages hierarchical supervision from ECG-text pairs.<n>We evaluate MELP on three public ECG datasets across multiple tasks, including zero-shot ECG classification, linear probing, and transfer learning.
arXiv Detail & Related papers (2025-06-11T07:22:17Z)
BrainOmni: A Brain Foundation Model for Unified EEG and MEG Signals [50.76802709706976]
This paper proposes Brain Omni, the first brain foundation model that generalises across heterogeneous EEG and MEG recordings.<n>To unify diverse data sources, we introduce BrainTokenizer, the first tokenizer that quantises neural brain activity into discrete representations.<n>A total of 1,997 hours of EEG and 656 hours of MEG data are curated and standardised from publicly available sources for pretraining.
arXiv Detail & Related papers (2025-05-18T14:07:14Z)
GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images [43.65650710265957]
We introduce GEM, the first MLLM unifying ECG time series, 12-lead ECG images and text for grounded and clinician-aligned ECG interpretation.<n> GEM enables feature-grounded analysis, evidence-driven reasoning, and a clinician-like diagnostic process through three core innovations.<n>We propose the Grounded ECG task, a clinically motivated benchmark designed to assess the MLLM's capability in grounded ECG understanding.
arXiv Detail & Related papers (2025-03-08T05:48:53Z)
CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information [61.1904164368732]
We propose CognitionCapturer, a unified framework that fully leverages multimodal data to represent EEG signals.<n>Specifically, CognitionCapturer trains Modality Experts for each modality to extract cross-modal information from the EEG modality.<n>The framework does not require any fine-tuning of the generative models and can be extended to incorporate more modalities.
arXiv Detail & Related papers (2024-12-13T16:27:54Z)
ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text [14.06147507373525]
This study introduces a new multimodal contrastive pretaining framework that aims to improve the quality and robustness of learned representations of 12-lead ECG signals. Our framework comprises two key components, including Cardio Query Assistant (CQA) and ECG Semantics Integrator(ESI)
arXiv Detail & Related papers (2024-05-26T06:45:39Z)
MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation [41.324530807795256]
Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions. Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation. We propose the Multimodal ECG Instruction Tuning (MEIT) framework, the first attempt to tackle ECG report generation with LLMs and multimodal instructions.
arXiv Detail & Related papers (2024-03-07T23:20:56Z)
Improving Diffusion Models for ECG Imputation with an Augmented Template Prior [43.6099225257178]
noisy and poor-quality recordings are a major issue for signals collected using mobile health systems. Recent studies have explored the imputation of missing values in ECG with probabilistic time-series models. We present a template-guided denoising diffusion probabilistic model (DDPM), PulseDiff, which is conditioned on an informative prior for a range of health conditions.
arXiv Detail & Related papers (2023-10-24T11:34:15Z)
DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection [49.196182908826565]
Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment. Current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images. This paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input.
arXiv Detail & Related papers (2023-09-07T13:43:46Z)
Text-to-ECG: 12-Lead Electrocardiogram Synthesis conditioned on Clinical Text Reports [6.659609788411503]
We present a text-to-ECG task, in which textual inputs are used to produce ECG outputs. We propose Auto-TTE, an autoregressive generative model conditioned on clinical text reports to synthesize 12-lead ECGs.
arXiv Detail & Related papers (2023-03-09T11:58:38Z)
Learning ECG Representations based on Manipulated Temporal-Spatial Reverse Detection [11.615287369669971]
We propose a straightforward but effective approach to learn ECG representations. Inspired by the temporal and spatial characteristics of ECG, we flip the original signals horizontally, vertically, and both horizontally and vertically. Results show that the ECG representations learned with our method lead to remarkable performances on the downstream task.
arXiv Detail & Related papers (2022-02-25T02:01:09Z)
Generalizing electrocardiogram delineation: training convolutional neural networks with synthetic data augmentation [63.51064808536065]
Existing databases for ECG delineation are small, being insufficient in size and in the array of pathological conditions they represent. This article delves has two main contributions. First, a pseudo-synthetic data generation algorithm was developed, based in probabilistically composing ECG traces given "pools" of fundamental segments, as cropped from the original databases, and a set of rules for their arrangement into coherent synthetic traces. Second, two novel segmentation-based loss functions have been developed, which attempt at enforcing the prediction of an exact number of independent structures and at producing closer segmentation boundaries by focusing on a reduced number of samples.
arXiv Detail & Related papers (2021-11-25T10:11:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.