Related papers: OpenECG: Benchmarking ECG Foundation Models with Public 1.2 Million Records

OpenECG: Benchmarking ECG Foundation Models with Public 1.2 Million Records

URL: http://arxiv.org/abs/2503.00711v1
Date: Sun, 02 Mar 2025 03:26:14 GMT
Title: OpenECG: Benchmarking ECG Foundation Models with Public 1.2 Million Records
Authors: Zhijiang Wan, Qianhao Yu, Jia Mao, Wenfeng Duan, Cheng Ding,
Abstract summary: This study introduces OpenECG, a large-scale benchmark of 1.2 million 12-lead ECG recordings from nine centers to evaluate ECG foundation models (ECG-FMs) trained on public datasets.<n>We investigate three self-supervised learning methods (SimCLR, BYOL, MAE) with ResNet-50 and Vision Transformer architectures, assessing model generalization through leave-one-dataset-out experiments and data scaling analysis.<n>Results show that pre-training on diverse datasets significantly improves generalization, with BYOL and MAE outperforming SimCLR, highlighting the efficacy of feature-consistency and generative learning over contrast
Score: 2.3942438969883906
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study introduces OpenECG, a large-scale benchmark of 1.2 million 12-lead ECG recordings from nine centers, to evaluate ECG foundation models (ECG-FMs) trained on public datasets. We investigate three self-supervised learning methods (SimCLR, BYOL, MAE) with ResNet-50 and Vision Transformer architectures, assessing model generalization through leave-one-dataset-out experiments and data scaling analysis. Results show that pre-training on diverse datasets significantly improves generalization, with BYOL and MAE outperforming SimCLR, highlighting the efficacy of feature-consistency and generative learning over contrastive approaches. Data scaling experiments reveal that performance saturates at 60-70% of total data for BYOL and MAE, while SimCLR requires more data. These findings demonstrate that publicly available ECG data can match or surpass proprietary datasets in training robust ECG-FMs, paving the way for scalable, clinically meaningful AI-driven ECG analysis.

Related papers

BioSerenity-E1: a self-supervised EEG model for medical applications [0.0]
BioSerenity-E1 is a family of self-supervised foundation models for clinical EEG applications. It combines spectral tokenization with masked prediction to achieve state-of-the-art performance across relevant diagnostic tasks.
arXiv Detail & Related papers (2025-03-13T13:42:46Z)
rECGnition_v2.0: Self-Attentive Canonical Fusion of ECG and Patient Data using deep learning for effective Cardiac Diagnostics [0.56337958460022]
This study uses MIT-BIH Arrhythmia dataset to evaluate the efficiency of rECGnition_v2.0 for various classes of arrhythmias.<n>The compact architectural footprint of the rECGnition_v2.0, characterized by its lesser trainable parameters, unfurled several advantages including interpretability and scalability.
arXiv Detail & Related papers (2025-02-22T15:16:46Z)
Fusion of ECG Foundation Model Embeddings to Improve Early Detection of Acute Coronary Syndromes [5.723893680574976]
This study explores the use of ECG foundation models, specifically ST-MEM and ECG-FM, to enhance ACS risk assessment.<n>We evaluate the performance of these models individually and through a fusion approach, where their embeddings are combined for enhanced prediction.
arXiv Detail & Related papers (2025-02-17T04:50:56Z)
Self-Supervised Pre-Training with Joint-Embedding Predictive Architecture Boosts ECG Classification Performance [0.0]
We create a large unsupervised pre-training dataset by combining ten public ECG databases. We pre-train Vision Transformers using JEPA on this dataset and fine-tune them on various PTB-XL benchmarks.
arXiv Detail & Related papers (2024-10-02T08:25:57Z)
ECG-FM: An Open Electrocardiogram Foundation Model [3.611746032873298]
We present ECG-FM, an open foundation model for ECG analysis. ECG-FM adopts a transformer-based architecture and is pretrained on 2.5 million samples. We show how its command of contextual information results in strong performance, rich pretrained embeddings, and reliable interpretability.
arXiv Detail & Related papers (2024-08-09T17:06:49Z)
DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection [49.196182908826565]
Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment. Current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images. This paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input.
arXiv Detail & Related papers (2023-09-07T13:43:46Z)
Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach. Our approach is easy to integrate into any hybrid model and requires no external training data. Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z)
Generalizing electrocardiogram delineation: training convolutional neural networks with synthetic data augmentation [63.51064808536065]
Existing databases for ECG delineation are small, being insufficient in size and in the array of pathological conditions they represent. This article delves has two main contributions. First, a pseudo-synthetic data generation algorithm was developed, based in probabilistically composing ECG traces given "pools" of fundamental segments, as cropped from the original databases, and a set of rules for their arrangement into coherent synthetic traces. Second, two novel segmentation-based loss functions have been developed, which attempt at enforcing the prediction of an exact number of independent structures and at producing closer segmentation boundaries by focusing on a reduced number of samples.
arXiv Detail & Related papers (2021-11-25T10:11:41Z)
Self-supervised representation learning from 12-lead ECG data [2.2691593216516868]
We put forward a comprehensive assessment of self-supervised representation learning from short segments of clinical 12-lead electrocardiography (ECG) data. To this end, we explore adaptations of state-of-the-art self-supervised learning algorithms from computer vision (SimCLR, BYOL, SwAV) and speech (CPC) For the best-performing method, CPC, we find linear evaluation performances only 0.8% below supervised performance.
arXiv Detail & Related papers (2021-03-23T16:50:39Z)
Deep learning-based COVID-19 pneumonia classification using chest CT images: model generalizability [54.86482395312936]
Deep learning (DL) classification models were trained to identify COVID-19-positive patients on 3D computed tomography (CT) datasets from different countries. We trained nine identical DL-based classification models by using combinations of the datasets with a 72% train, 8% validation, and 20% test data split. The models trained on multiple datasets and evaluated on a test set from one of the datasets used for training performed better.
arXiv Detail & Related papers (2021-02-18T21:14:52Z)
Uncovering the structure of clinical EEG signals with self-supervised learning [64.4754948595556]
Supervised learning paradigms are often limited by the amount of labeled data that is available. This phenomenon is particularly problematic in clinically-relevant data, such as electroencephalography (EEG) By extracting information from unlabeled data, it might be possible to reach competitive performance with deep neural networks.
arXiv Detail & Related papers (2020-07-31T14:34:47Z)
ECG-DelNet: Delineation of Ambulatory Electrocardiograms with Mixed Quality Labeling Using Neural Networks [69.25956542388653]
Deep learning (DL) algorithms are gaining weight in academic and industrial settings. We demonstrate DL can be successfully applied to low interpretative tasks by embedding ECG detection and delineation onto a segmentation framework. The model was trained using PhysioNet's QT database, comprised of 105 ambulatory ECG recordings.
arXiv Detail & Related papers (2020-05-11T16:29:12Z)
Opportunities and Challenges of Deep Learning Methods for Electrocardiogram Data: A Systematic Review [62.490310870300746]
The electrocardiogram (ECG) is one of the most commonly used diagnostic tools in medicine and healthcare. Deep learning methods have achieved promising results on predictive healthcare tasks using ECG signals. This paper presents a systematic review of deep learning methods for ECG data from both modeling and application perspectives.
arXiv Detail & Related papers (2019-12-28T02:44:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.