High-Fidelity Synthetic ECG Generation via Mel-Spectrogram Informed Diffusion Training
- URL: http://arxiv.org/abs/2510.05492v2
- Date: Thu, 09 Oct 2025 00:47:14 GMT
- Title: High-Fidelity Synthetic ECG Generation via Mel-Spectrogram Informed Diffusion Training
- Authors: Zhuoyi Huang, Nutan Sahoo, Anamika Kumari, Girish Kumar, Kexuan Cai, Shixing Cao, Yue Kang, Tian Xia, Somya Chatterjee, Nicholas Hausman, Aidan Jay, Eric S. Rosenthal, Soundar Srinivasan, Sadid Hasan, Alex Fedorov, Sulaiman Vesal,
- Abstract summary: Development of machine learning for cardiac care is hampered by privacy restrictions on sharing real patient electrocardiogram (ECG) data.<n>In this work, we address two major shortcomings of current generative ECG methods.<n>We build on a conditional diffusion-based Structured State Space Model (SSSD-ECG) with two principled innovations.
- Score: 3.864395218585964
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The development of machine learning for cardiac care is severely hampered by privacy restrictions on sharing real patient electrocardiogram (ECG) data. Although generative AI offers a promising solution, the real-world use of existing model-synthesized ECGs is limited by persistent gaps in trustworthiness and clinical utility. In this work, we address two major shortcomings of current generative ECG methods: insufficient morphological fidelity and the inability to generate personalized, patient-specific physiological signals. To address these gaps, we build on a conditional diffusion-based Structured State Space Model (SSSD-ECG) with two principled innovations: (1) MIDT-ECG (Mel-Spectrogram Informed Diffusion Training), a novel training paradigm with time-frequency domain supervision to enforce physiological structural realism, and (2) multi-modal demographic conditioning to enable patient-specific synthesis. We comprehensively evaluate our approach on the PTB-XL dataset, assessing the synthesized ECG signals on fidelity, clinical coherence, privacy preservation, and downstream task utility. MIDT-ECG achieves substantial gains: it improves morphological coherence, preserves strong privacy guarantees with all metrics evaluated exceeding the baseline by 4-8%, and notably reduces the interlead correlation error by an average of 74%, while demographic conditioning enhances signal-to-noise ratio and personalization. In critical low-data regimes, a classifier trained on datasets supplemented with our synthetic ECGs achieves performance comparable to a classifier trained solely on real data. Together, we demonstrate that ECG synthesizers, trained with the proposed time-frequency structural regularization scheme, can serve as personalized, high-fidelity, privacy-preserving surrogates when real data are scarce, advancing the responsible use of generative AI in healthcare.
Related papers
- Synthetic Electrogram Generation with Variational Autoencoders for ECGI [0.0]
We propose variational autoencoders (VAEs) for the generation of synthetic multichannel atrial EGMs.<n>Two models are proposed: a sinus rhythm-specific VAE (VAE-S) and a class-conditioned VAE (VAE-C) trained on both sinus rhythm and AF signals.<n>VAE-S achieves higher fidelity with respect to in silico EGMs, while VAE-C enables rhythm-specific generation at the expense of reduced sinus reconstruction quality.
arXiv Detail & Related papers (2025-12-16T16:13:25Z) - Transferring Clinical Knowledge into ECGs Representation [0.19498378931702776]
We propose a novel three-stage training paradigm that transfers knowledge from multimodal clinical data into a powerful, yet unimodal, ECG encoder.<n>We employ a self-supervised, joint-embedding pre-training stage to create an ECG representation that is enriched with contextual clinical information.<n>As an indirect way to explain the model's output, we train it to also predict associated laboratory abnormalities directly from the ECG embedding.
arXiv Detail & Related papers (2025-12-07T22:19:24Z) - Simulator and Experience Enhanced Diffusion Model for Comprehensive ECG Generation [52.19347532840774]
We propose SE-Diff, a novel physiological simulator and experience enhanced diffusion model for ECG generation.<n> SE-Diff integrates a lightweight ordinary differential equation (ODE)-based ECG simulator into the diffusion process via a beat decoder.<n>Extensive experiments on real-world ECG datasets demonstrate that SE-Diff improves both signal fidelity and text-ECG semantic alignment.
arXiv Detail & Related papers (2025-11-13T02:57:10Z) - Domain Knowledge is Power: Leveraging Physiological Priors for Self Supervised Representation Learning in Electrocardiography [3.1670118965354934]
We introduce PhysioCLR (Physiology-aware Contrastive Learning Representation for ECG), a physiology-aware contrastive learning framework.<n>During pretraining, PhysioCLR learns to bring together embeddings of samples that share similar clinically relevant features.<n>We evaluate PhysioCLR on two public ECG datasets, Chapman and Georgia, for multilabel ECG diagnoses.
arXiv Detail & Related papers (2025-09-09T19:44:50Z) - Improving Myocardial Infarction Detection via Synthetic ECG Pretraining [0.0]
Myocardial infarction is a major cause of death globally, and accurate early diagnosis from electrocardiograms (ECGs) remains a clinical priority.<n>Deep learning models have shown promise for automated ECG interpretation, but require large amounts of labeled data.<n>We propose a physiology-aware pipeline that synthesizes 12-lead ECGs with tunable MI morphology and realistic noise.
arXiv Detail & Related papers (2025-06-29T14:29:55Z) - DiffuSETS: 12-lead ECG Generation Conditioned on Clinical Text Reports and Patient-Specific Information [13.680337221159506]
Heart disease remains a significant threat to human health.<n>Scarcity of high-quality ECG data, driven by privacy concerns and limited medical resources, creates a pressing need for effective ECG signal generation.<n>We propose DiffuSETS, a novel framework capable of generating ECG signals with high semantic alignment and fidelity.
arXiv Detail & Related papers (2025-01-10T12:55:34Z) - Synthetic Time Series Data Generation for Healthcare Applications: A PCG Case Study [43.28613210217385]
We employ and compare three state-of-the-art generative models to generate PCG data.<n>Our results demonstrate that the generated PCG data closely resembles the original datasets.<n>In our future work, we plan to incorporate this method into a data augmentation pipeline to synthesize abnormal PCG signals with heart murmurs.
arXiv Detail & Related papers (2024-12-17T18:07:40Z) - Self-supervised inter-intra period-aware ECG representation learning for detecting atrial fibrillation [41.82319894067087]
We propose an inter-intra period-aware ECG representation learning approach.
Considering ECGs of atrial fibrillation patients exhibit the irregularity in RR intervals and the absence of P-waves, we develop specific pre-training tasks for interperiod and intraperiod representations.
Our approach demonstrates remarkable AUC performances on the BTCH dataset, textiti.e., 0.953/0.996 for paroxysmal/persistent atrial fibrillation detection.
arXiv Detail & Related papers (2024-10-08T10:03:52Z) - SSSD-ECG-nle: New Label Embeddings with Structured State-Space Models for ECG generation [0.0]
Diffusion models have made significant progress in recent years, creating the possibility for synthesizing data comparable to the real one.
We propose the SSSD-ECG-nle architecture based on SSSD-ECG with a modified conditioning mechanism and demonstrate its efficiency on downstream tasks.
arXiv Detail & Related papers (2024-07-15T16:31:25Z) - MEIT: Multimodal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation [28.35107188450758]
Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions.<n>Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation.<n>We propose the Multimodal ECG Instruction Tuning (MEIT) framework, the first attempt to tackle ECG report generation with LLMs and multimodal instructions.
arXiv Detail & Related papers (2024-03-07T23:20:56Z) - Improving Diffusion Models for ECG Imputation with an Augmented Template
Prior [43.6099225257178]
noisy and poor-quality recordings are a major issue for signals collected using mobile health systems.
Recent studies have explored the imputation of missing values in ECG with probabilistic time-series models.
We present a template-guided denoising diffusion probabilistic model (DDPM), PulseDiff, which is conditioned on an informative prior for a range of health conditions.
arXiv Detail & Related papers (2023-10-24T11:34:15Z) - ECGAN: Self-supervised generative adversarial network for
electrocardiography [11.460692362624533]
High-quality synthetic data can support the development of effective predictive models for biomedical tasks.
These limitations, for instance, negatively impact open access to electrocardiography datasets about arrhythmias.
This work introduces a self-supervised approach to the generation of synthetic electrocardiography time series.
arXiv Detail & Related papers (2023-01-23T15:48:02Z) - Generalizing electrocardiogram delineation: training convolutional
neural networks with synthetic data augmentation [63.51064808536065]
Existing databases for ECG delineation are small, being insufficient in size and in the array of pathological conditions they represent.
This article delves has two main contributions. First, a pseudo-synthetic data generation algorithm was developed, based in probabilistically composing ECG traces given "pools" of fundamental segments, as cropped from the original databases, and a set of rules for their arrangement into coherent synthetic traces.
Second, two novel segmentation-based loss functions have been developed, which attempt at enforcing the prediction of an exact number of independent structures and at producing closer segmentation boundaries by focusing on a reduced number of samples.
arXiv Detail & Related papers (2021-11-25T10:11:41Z) - ECG-DelNet: Delineation of Ambulatory Electrocardiograms with Mixed
Quality Labeling Using Neural Networks [69.25956542388653]
Deep learning (DL) algorithms are gaining weight in academic and industrial settings.
We demonstrate DL can be successfully applied to low interpretative tasks by embedding ECG detection and delineation onto a segmentation framework.
The model was trained using PhysioNet's QT database, comprised of 105 ambulatory ECG recordings.
arXiv Detail & Related papers (2020-05-11T16:29:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.