Benchmarking ECG Foundational Models: A Reality Check Across Clinical Tasks
- URL: http://arxiv.org/abs/2509.25095v1
- Date: Mon, 29 Sep 2025 17:29:48 GMT
- Title: Benchmarking ECG Foundational Models: A Reality Check Across Clinical Tasks
- Authors: M A Al-Masud, Juan Miguel Lopez Alcaraz, Nils Strodthoff,
- Abstract summary: Foundation models promise broader adaptability, but their generalization across diverse ECG tasks is not well understood.<n>We benchmarked eight ECG foundation models on 26 clinically relevant tasks using 12 public datasets.<n>While foundation models show promise for adult ECG analysis, substantial gaps remain in cardiac structure, outcome prediction, and patient characterization.
- Score: 1.6873748786804317
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The 12-lead electrocardiogram (ECG) is a long-standing diagnostic tool. Yet machine learning for ECG interpretation remains fragmented, often limited to narrow tasks or datasets. Foundation models promise broader adaptability, but their generalization across diverse ECG tasks is not well understood. We benchmarked eight ECG foundation models on 26 clinically relevant tasks using 12 public datasets comprising 1,650 regression and classification targets. Models were evaluated under fine-tuning and frozen settings, with scaling analyses across dataset sizes. Results show heterogeneous performance across domains: in the most widely studied domain, adult ECG interpretation, three foundation models consistently outperformed strong supervised baselines. In contrast, ECG-CPC, a compact structured state-space model pretrained on HEEDB, dominated other categories where most foundation models failed to surpass supervised learning. Foundation models also displayed distinct scaling behaviors with dataset size, which are critical for small-scale clinical applications. Overall, while foundation models show promise for adult ECG analysis, substantial gaps remain in cardiac structure, outcome prediction, and patient characterization. Notably, ECG-CPC's strong performance despite being orders of magnitude smaller and consuming minimal computational resources highlights untapped opportunities for advancing ECG foundation models.
Related papers
- EnECG: Efficient Ensemble Learning for Electrocardiogram Multi-task Foundation Model [46.84040404474695]
EnECG is an ensemble-based framework that integrates multiple specialized foundation models, each excelling in different aspects of ECG interpretation.<n>We show that EnECG can help reduce computational and memory costs while maintaining the strong representational power of foundation models.<n>This framework not only enhances feature extraction and predictive performance but also ensures practical efficiency for real-world clinical applications.
arXiv Detail & Related papers (2025-11-28T07:22:33Z) - An Electrocardiogram Multi-task Benchmark with Comprehensive Evaluations and Insightful Findings [21.836042030973797]
Analyzing the ECG typically requires domain expertise, which is a roadblock to applying artificial intelligence for healthcare.<n>We evaluate language/general time-series/ECG foundation models in comparison with time-series deep learning models.<n>In-depth analyses and insights are provided along with comprehensive experimental results.
arXiv Detail & Related papers (2025-11-28T06:47:21Z) - Simulator and Experience Enhanced Diffusion Model for Comprehensive ECG Generation [52.19347532840774]
We propose SE-Diff, a novel physiological simulator and experience enhanced diffusion model for ECG generation.<n> SE-Diff integrates a lightweight ordinary differential equation (ODE)-based ECG simulator into the diffusion process via a beat decoder.<n>Extensive experiments on real-world ECG datasets demonstrate that SE-Diff improves both signal fidelity and text-ECG semantic alignment.
arXiv Detail & Related papers (2025-11-13T02:57:10Z) - Hierarchical Attention Network for Interpretable ECG-based Heart Disease Classification [0.7234862895932991]
We adapt a hierarchical attention network (HAN), originally developed for text classification, into an ECG-based heart-disease classification task.<n>For the MIT-BIH dataset, our adapted HAN achieves 98.55% test accuracy compared to 99.14% for CAT-Net, while reducing the number of model parameters by a factor of 15.6.<n>For the PTB-XL dataset, our adapted HAN achieves a 19.3-fold reduction in model complexity compared to CAT-Net, with only a 5% lower test accuracy.
arXiv Detail & Related papers (2025-03-25T13:06:06Z) - GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images [43.65650710265957]
We introduce GEM, the first MLLM unifying ECG time series, 12-lead ECG images and text for grounded and clinician-aligned ECG interpretation.<n> GEM enables feature-grounded analysis, evidence-driven reasoning, and a clinician-like diagnostic process through three core innovations.<n>We propose the Grounded ECG task, a clinically motivated benchmark designed to assess the MLLM's capability in grounded ECG understanding.
arXiv Detail & Related papers (2025-03-08T05:48:53Z) - An Electrocardiogram Foundation Model Built on over 10 Million Recordings with External Evaluation across Multiple Domains [17.809094003643523]
ECG Foundation Model (ECGFounder) trained on over 10 million ECGs with 150 label categories from Harvard-Emory ECG Database.<n>ECGFounder achieves expert-level performance on internal validation sets, with AUROC exceeding 0.95 for eighty diagnoses.<n>When fine-tuned, ECGFounder outperforms baseline models in demographic analysis, clinical event detection, and cross-modality cardiac rhythm diagnosis.
arXiv Detail & Related papers (2024-10-05T12:12:02Z) - ECG-FM: An Open Electrocardiogram Foundation Model [3.8270632390229777]
We present ECG-FM, an open foundation model for ECG analysis, and conduct a study using a dataset of 1.5 million ECGs.<n>ECG-FM is a transformer-based model pretrained using a hybrid contrastive and generative self-supervised learning approach.<n>We affirm that ECG-FM is robust, label-efficient, and functionally discriminative by showcasing data scaling experiments, performing a latent space analysis, and generating saliency maps.
arXiv Detail & Related papers (2024-08-09T17:06:49Z) - TACCO: Task-guided Co-clustering of Clinical Concepts and Patient Visits for Disease Subtyping based on EHR Data [42.96821770394798]
TACCO is a novel framework that jointly discovers clusters of clinical concepts and patient visits based on a hypergraph modeling of EHR data.
We conduct experiments on the public MIMIC-III dataset and Emory internal CRADLE dataset over the downstream clinical tasks of phenotype classification and cardiovascular risk prediction.
In-depth model analysis, clustering results analysis, and clinical case studies further validate the improved utilities and insightful interpretations delivered by TACCO.
arXiv Detail & Related papers (2024-06-14T14:18:38Z) - MEIT: Multimodal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation [28.35107188450758]
Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions.<n>Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation.<n>We propose the Multimodal ECG Instruction Tuning (MEIT) framework, the first attempt to tackle ECG report generation with LLMs and multimodal instructions.
arXiv Detail & Related papers (2024-03-07T23:20:56Z) - Scaling Representation Learning from Ubiquitous ECG with State-Space
Models [28.776392386988043]
We introduce textbfWildECG, a pre-trained state-space model for representation learning from ECG signals.
We train this model in a self-supervised manner with 275,000 10s ECG recordings collected in the wild and evaluate it on a range of downstream tasks.
arXiv Detail & Related papers (2023-09-26T22:08:19Z) - Generalizing electrocardiogram delineation: training convolutional
neural networks with synthetic data augmentation [63.51064808536065]
Existing databases for ECG delineation are small, being insufficient in size and in the array of pathological conditions they represent.
This article delves has two main contributions. First, a pseudo-synthetic data generation algorithm was developed, based in probabilistically composing ECG traces given "pools" of fundamental segments, as cropped from the original databases, and a set of rules for their arrangement into coherent synthetic traces.
Second, two novel segmentation-based loss functions have been developed, which attempt at enforcing the prediction of an exact number of independent structures and at producing closer segmentation boundaries by focusing on a reduced number of samples.
arXiv Detail & Related papers (2021-11-25T10:11:41Z) - ECG-DelNet: Delineation of Ambulatory Electrocardiograms with Mixed
Quality Labeling Using Neural Networks [69.25956542388653]
Deep learning (DL) algorithms are gaining weight in academic and industrial settings.
We demonstrate DL can be successfully applied to low interpretative tasks by embedding ECG detection and delineation onto a segmentation framework.
The model was trained using PhysioNet's QT database, comprised of 105 ambulatory ECG recordings.
arXiv Detail & Related papers (2020-05-11T16:29:12Z) - Opportunities and Challenges of Deep Learning Methods for
Electrocardiogram Data: A Systematic Review [62.490310870300746]
The electrocardiogram (ECG) is one of the most commonly used diagnostic tools in medicine and healthcare.
Deep learning methods have achieved promising results on predictive healthcare tasks using ECG signals.
This paper presents a systematic review of deep learning methods for ECG data from both modeling and application perspectives.
arXiv Detail & Related papers (2019-12-28T02:44:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.