Related papers: Position: Evaluation of ECG Representations Must Be Fixed

Position: Evaluation of ECG Representations Must Be Fixed

URL: http://arxiv.org/abs/2602.17531v1
Date: Thu, 19 Feb 2026 16:42:46 GMT
Title: Position: Evaluation of ECG Representations Must Be Fixed
Authors: Zachary Berger, Daniel Prakah-Asante, John Guttag, Collin M. Stultz,
Abstract summary: This position paper argues that current benchmarking practice in 12-lead ECG representation learning must be fixed to ensure progress is reliable and aligned with clinically meaningful objectives.<n>We argue that downstream evaluation should expand to include an assessment of structural heart disease and patient-level forecasting, in addition to other evolving ECG-related endpoints, as relevant clinical targets.
Score: 1.567009619451362
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This position paper argues that current benchmarking practice in 12-lead ECG representation learning must be fixed to ensure progress is reliable and aligned with clinically meaningful objectives. The field has largely converged on three public multi-label benchmarks (PTB-XL, CPSC2018, CSN) dominated by arrhythmia and waveform-morphology labels, even though the ECG is known to encode substantially broader clinical information. We argue that downstream evaluation should expand to include an assessment of structural heart disease and patient-level forecasting, in addition to other evolving ECG-related endpoints, as relevant clinical targets. Next, we outline evaluation best practices for multi-label, imbalanced settings, and show that when they are applied, the literature's current conclusion about which representations perform best is altered. Furthermore, we demonstrate the surprising result that a randomly initialized encoder with linear evaluation matches state-of-the-art pre-training on many tasks. This motivates the use of a random encoder as a reasonable baseline model. We substantiate our observations with an empirical evaluation of three representative ECG pre-training approaches across six evaluation settings: the three standard benchmarks, a structural disease dataset, hemodynamic inference, and patient forecasting.

Related papers

Looking Beyond Accuracy: A Holistic Benchmark of ECG Foundation Models [0.3914676152740142]
This study aims to find an in-depth, comprehensive benchmarking framework for Foundation Models (FMs)<n>We introduce a benchmark methodology that complements performance-based evaluation with representation-level analysis.<n>We also rely on the methodology for carrying out an extensive evaluation of several ECG-expert FMs pretrained via state-of-the-art techniques.
arXiv Detail & Related papers (2026-01-29T15:14:00Z)
Benchmarking Egocentric Clinical Intent Understanding Capability for Medical Multimodal Large Language Models [48.95516224614331]
We introduce MedGaze-Bench, the first benchmark leveraging clinician gaze as a Cognitive Cursor to assess intent understanding across surgery, emergency simulation, and diagnostic interpretation.<n>Our benchmark addresses three fundamental challenges: visual homogeneity of anatomical structures, strict temporal-causal dependencies in clinical, and implicit adherence to safety protocols.
arXiv Detail & Related papers (2026-01-11T02:20:40Z)
Transferring Clinical Knowledge into ECGs Representation [0.19498378931702776]
We propose a novel three-stage training paradigm that transfers knowledge from multimodal clinical data into a powerful, yet unimodal, ECG encoder.<n>We employ a self-supervised, joint-embedding pre-training stage to create an ECG representation that is enriched with contextual clinical information.<n>As an indirect way to explain the model's output, we train it to also predict associated laboratory abnormalities directly from the ECG embedding.
arXiv Detail & Related papers (2025-12-07T22:19:24Z)
CLEF: Clinically-Guided Contrastive Learning for Electrocardiogram Foundation Models [13.613519337591507]
Single-lead ECG recording is integrated into both clinical-grade and consumer wearables.<n>While self-supervised pretraining of foundation models on unlabeled ECGs improves diagnostic performance, existing approaches do not incorporate domain knowledge from clinical metadata.<n>We introduce a novel contrastive learning approach that utilizes an established clinical risk score to adaptively weight negative pairs.
arXiv Detail & Related papers (2025-12-01T20:21:44Z)
Simulator and Experience Enhanced Diffusion Model for Comprehensive ECG Generation [52.19347532840774]
We propose SE-Diff, a novel physiological simulator and experience enhanced diffusion model for ECG generation.<n> SE-Diff integrates a lightweight ordinary differential equation (ODE)-based ECG simulator into the diffusion process via a beat decoder.<n>Extensive experiments on real-world ECG datasets demonstrate that SE-Diff improves both signal fidelity and text-ECG semantic alignment.
arXiv Detail & Related papers (2025-11-13T02:57:10Z)
Cross-Representation Benchmarking in Time-Series Electronic Health Records for Clinical Outcome Prediction [44.23284500920266]
This benchmark standardises data curation and evaluation across two distinct clinical settings.<n>Experiments reveal that event stream models consistently deliver the strongest performance.<n>We find that feature selection strategies must be adapted to the clinical setting.
arXiv Detail & Related papers (2025-10-10T09:03:47Z)
Explainable AI (XAI) for Arrhythmia detection from electrocardiograms [0.0]
Deep learning has enabled highly accurate arrhythmia detection from electrocardiogram (ECG) signals, but limited interpretability remains a barrier to clinical adoption.<n>This study investigates the application of Explainable AI (XAI) techniques specifically adapted for time-series ECG analysis.
arXiv Detail & Related papers (2025-08-24T10:44:24Z)
EEG-MedRAG: Enhancing EEG-based Clinical Decision-Making via Hierarchical Hypergraph Retrieval-Augmented Generation [45.031633614714]
EEG-MedRAG is a three-layer hypergraph-based retrieval-augmented generation framework.<n>It unifies EEG domain knowledge, individual patient cases, and a large-scale repository into a traversable n-ary relational hypergraph.<n>We introduce the first cross-disease, cross-role EEG clinical QA benchmark, spanning seven disorders and five authentic clinical perspectives.
arXiv Detail & Related papers (2025-08-19T11:12:58Z)
A Systematic Review of ECG Arrhythmia Classification: Adherence to Standards, Fair Evaluation, and Embedded Feasibility [0.1932975952237668]
This review systematically analyzes ECG classification studies published between 2017 and 2024.<n>We identify state-of-the-art methods meeting E3C criteria and conduct a comparative analysis of accuracy, inference time, energy consumption, and memory usage.<n>By addressing these gaps, this study aims to guide future research toward more robust and clinically viable ECG classification systems.
arXiv Detail & Related papers (2025-03-10T12:57:43Z)
Self-supervised inter-intra period-aware ECG representation learning for detecting atrial fibrillation [41.82319894067087]
We propose an inter-intra period-aware ECG representation learning approach. Considering ECGs of atrial fibrillation patients exhibit the irregularity in RR intervals and the absence of P-waves, we develop specific pre-training tasks for interperiod and intraperiod representations. Our approach demonstrates remarkable AUC performances on the BTCH dataset, textiti.e., 0.953/0.996 for paroxysmal/persistent atrial fibrillation detection.
arXiv Detail & Related papers (2024-10-08T10:03:52Z)
ECG-DelNet: Delineation of Ambulatory Electrocardiograms with Mixed Quality Labeling Using Neural Networks [69.25956542388653]
Deep learning (DL) algorithms are gaining weight in academic and industrial settings. We demonstrate DL can be successfully applied to low interpretative tasks by embedding ECG detection and delineation onto a segmentation framework. The model was trained using PhysioNet's QT database, comprised of 105 ambulatory ECG recordings.
arXiv Detail & Related papers (2020-05-11T16:29:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.