Multimodal Latent Fusion of ECG Leads for Early Assessment of Pulmonary Hypertension
- URL: http://arxiv.org/abs/2503.13470v2
- Date: Mon, 08 Sep 2025 14:41:35 GMT
- Title: Multimodal Latent Fusion of ECG Leads for Early Assessment of Pulmonary Hypertension
- Authors: Mohammod N. I. Suvon, Shuo Zhou, Prasun C. Tripathi, Wenrui Fan, Samer Alabed, Bishesh Khanal, Venet Osmani, Andrew J. Swift, Chen, Chen, Haiping Lu,
- Abstract summary: We propose a lead-specific electrocardiogram multimodal variational autoencoder (textscLS-EMVAE)<n>textscLS-EMVAE incorporates a hierarchical modality expert (HiME) fusion mechanism and a latent representation alignment loss.<n>We validate textscLS-EMVAE across two retrospective cohorts in a 6L-ECG setting.
- Score: 30.124231086488976
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in early assessment of pulmonary hypertension (PH) primarily focus on applying machine learning methods to centralized diagnostic modalities, such as 12-lead electrocardiogram (12L-ECG). Despite their potential, these approaches fall short in decentralized clinical settings, e.g., point-of-care and general practice, where handheld 6-lead ECG (6L-ECG) can offer an alternative but is limited by the scarcity of labeled data for developing reliable models. To address this, we propose a lead-specific electrocardiogram multimodal variational autoencoder (\textsc{LS-EMVAE}), which incorporates a hierarchical modality expert (HiME) fusion mechanism and a latent representation alignment loss. HiME combines mixture-of-experts and product-of-experts to enable flexible, adaptive latent fusion, while the alignment loss improves coherence among lead-specific and shared representations. To alleviate data scarcity and enhance representation learning, we adopt a transfer learning strategy: the model is first pre-trained on a large unlabeled 12L-ECG dataset and then fine-tuned on smaller task-specific labeled 6L-ECG datasets. We validate \textsc{LS-EMVAE} across two retrospective cohorts in a 6L-ECG setting: 892 subjects from the ASPIRE registry for (1) PH detection and (2) phenotyping pre-/post-capillary PH, and 16,416 subjects from UK Biobank for (3) predicting elevated pulmonary atrial wedge pressure, where it consistently outperforms unimodal and multimodal baseline methods and demonstrates strong generalizability and interpretability. The code is available at https://github.com/Shef-AIRE/LS-EMVAE.
Related papers
- CLEF: Clinically-Guided Contrastive Learning for Electrocardiogram Foundation Models [13.613519337591507]
Single-lead ECG recording is integrated into both clinical-grade and consumer wearables.<n>While self-supervised pretraining of foundation models on unlabeled ECGs improves diagnostic performance, existing approaches do not incorporate domain knowledge from clinical metadata.<n>We introduce a novel contrastive learning approach that utilizes an established clinical risk score to adaptively weight negative pairs.
arXiv Detail & Related papers (2025-12-01T20:21:44Z) - Simulator and Experience Enhanced Diffusion Model for Comprehensive ECG Generation [52.19347532840774]
We propose SE-Diff, a novel physiological simulator and experience enhanced diffusion model for ECG generation.<n> SE-Diff integrates a lightweight ordinary differential equation (ODE)-based ECG simulator into the diffusion process via a beat decoder.<n>Extensive experiments on real-world ECG datasets demonstrate that SE-Diff improves both signal fidelity and text-ECG semantic alignment.
arXiv Detail & Related papers (2025-11-13T02:57:10Z) - A Benchmark Study of Deep Learning Methods for Multi-Label Pediatric Electrocardiogram-Based Cardiovascular Disease Classification [0.0]
This paper presents the first benchmark study of deep learning for multi-label pediatric CVD classification on the recently released ZZU-pECG dataset.<n>We evaluate four representative paradigms--ResNet-1D, BiLSTM, Transformer, and Mamba 2--under both 9-lead and 12-lead configurations.<n>All models achieved strong results, with Hamming Loss as low as 0.0069 and F1-scores above 85% in most settings.
arXiv Detail & Related papers (2025-10-04T11:08:46Z) - EchoBench: Benchmarking Sycophancy in Medical Large Vision-Language Models [82.43729208063468]
Recent benchmarks for medical Large Vision-Language Models (LVLMs) emphasize leaderboard accuracy, overlooking reliability and safety.<n>We study sycophancy -- models' tendency to uncritically echo user-provided information.<n>We introduce EchoBench, a benchmark to systematically evaluate sycophancy in medical LVLMs.
arXiv Detail & Related papers (2025-09-24T14:09:55Z) - Explainable AI (XAI) for Arrhythmia detection from electrocardiograms [0.0]
Deep learning has enabled highly accurate arrhythmia detection from electrocardiogram (ECG) signals, but limited interpretability remains a barrier to clinical adoption.<n>This study investigates the application of Explainable AI (XAI) techniques specifically adapted for time-series ECG analysis.
arXiv Detail & Related papers (2025-08-24T10:44:24Z) - Sensing Cardiac Health Across Scenarios and Devices: A Multi-Modal Foundation Model Pretrained on Heterogeneous Data from 1.7 Million Individuals [36.08910150609342]
We present a cardiac sensing foundation model (CSFM) that learns unified representations from vast, heterogeneous health records.<n>Our model is pretrained on an innovative multi-modal integration of data from multiple large-scale datasets.<n> CSFM consistently outperforms traditional one-modal-one-task approaches.
arXiv Detail & Related papers (2025-06-23T20:58:12Z) - Heartcare Suite: Multi-dimensional Understanding of ECG with Raw Multi-lead Signal Modeling [50.58126509704037]
Heartcare Suite is a framework for fine-grained electrocardiogram (ECG) understanding.<n>Heartcare-220K is a high-quality, structured, and comprehensive multimodal ECG dataset.<n>Heartcare-Bench is a benchmark to guide the optimization of Medical Multimodal Large Language Models (Med-MLLMs) in ECG scenarios.
arXiv Detail & Related papers (2025-06-06T07:56:41Z) - xLSTM-ECG: Multi-label ECG Classification via Feature Fusion with xLSTM [14.02717596836022]
We propose xLSTM-ECG, a novel approach for multi-label classification of ECG signals.
To the best of our knowledge, this work represents the first design and application of xLSTM modules specifically adapted for multi-label ECG classification.
arXiv Detail & Related papers (2025-04-14T16:12:46Z) - GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images [44.50428701650495]
We introduce GEM, the first MLLM unifying ECG time series, 12-lead ECG images and text for grounded and clinician-aligned ECG interpretation.<n> GEM enables feature-grounded analysis, evidence-driven reasoning, and a clinician-like diagnostic process through three core innovations.<n>We propose the Grounded ECG task, a clinically motivated benchmark designed to assess the MLLM's capability in grounded ECG understanding.
arXiv Detail & Related papers (2025-03-08T05:48:53Z) - Conditional Electrocardiogram Generation Using Hierarchical Variational Autoencoders [0.0]
We propose a conditional Nouveau VAE model for ECG signal generation (cNVAE-ECG)
This paper proposes the publicly available conditional Nouveau VAE model for ECG signal generation (cNVAE-ECG)
arXiv Detail & Related papers (2025-03-03T13:30:36Z) - An Electrocardiogram Foundation Model Built on over 10 Million Recordings with External Evaluation across Multiple Domains [17.809094003643523]
We introduce an ECG Foundation Model (ECGFounder) to broaden the diagnostic capabilities of ECG analysis.
ECGFounder was trained on over 10 million ECGs with 150 label categories from the Harvard-Emory ECG Database.
It achieves expert-level performance on internal validation sets, with AUROC exceeding 0.95 for eighty diagnoses.
arXiv Detail & Related papers (2024-10-05T12:12:02Z) - Multi-Channel Masked Autoencoder and Comprehensive Evaluations for Reconstructing 12-Lead ECG from Arbitrary Single-Lead ECG [19.74009541199362]
This study proposes a multi-channel masked autoencoder (MCMA) for reconstructing 12-Lead ECG from arbitrary single-lead ECG.
In the signal-level evaluation, the mean square errors of 0.0317 and 0.1034, Pearson correlation coefficients of 0.7885 and 0.7420.
In the feature-level evaluation, the average standard deviation of the mean heart rate across the generated 12-lead ECG is 1.0481, the coefficient of variation is 1.58%, and the range is 3.2874.
arXiv Detail & Related papers (2024-07-16T08:17:45Z) - Foundation Models for ECG: Leveraging Hybrid Self-Supervised Learning for Advanced Cardiac Diagnostics [2.948318253609515]
Using foundation models enhanced by self-supervised learning (SSL) methods presents an innovative approach to electrocardiogram (ECG) analysis.
This study comprehensively evaluates foundation models for ECGs, leveraging SSL methods, including generative and contrastive learning.
We developed a Hybrid Learning (HL) for foundation models that improve the precision and reliability of cardiac diagnostics.
arXiv Detail & Related papers (2024-06-26T02:24:13Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement [10.611952462532908]
Multimodal ECG Representation Learning (MERL) is capable of performing zero-shot ECG classification with text prompts.
We propose the Clinical Knowledge Enhanced Prompt Engineering (CKEPE) approach to exploit external expert-verified clinical knowledge databases.
MERL achieves an average AUC score of 75.2% in zero-shot classification (without training data), 3.2% higher than linear probed eSSL methods with 10% annotated training data, averaged across all six datasets.
arXiv Detail & Related papers (2024-03-11T12:28:55Z) - Learning to diagnose cirrhosis from radiological and histological labels
with joint self and weakly-supervised pretraining strategies [62.840338941861134]
We propose to leverage transfer learning from large datasets annotated by radiologists, to predict the histological score available on a small annex dataset.
We compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis.
This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75.
arXiv Detail & Related papers (2023-02-16T17:06:23Z) - A Dual-scale Lead-seperated Transformer With Lead-orthogonal Attention
And Meta-information For Ecg Classification [26.07181634056045]
This work proposes a dual-scale lead-separated transformer with lead-orthogonal attention and meta-information (DLTM-ECG)
ECG segments are interpreted as independent patches, and together with the reduced dimension signal, they form a dual-scale representation.
Our work has the potential for similar multichannel bioelectrical signal processing and physiological multimodal tasks.
arXiv Detail & Related papers (2022-11-23T08:45:34Z) - Generalizing electrocardiogram delineation: training convolutional
neural networks with synthetic data augmentation [63.51064808536065]
Existing databases for ECG delineation are small, being insufficient in size and in the array of pathological conditions they represent.
This article delves has two main contributions. First, a pseudo-synthetic data generation algorithm was developed, based in probabilistically composing ECG traces given "pools" of fundamental segments, as cropped from the original databases, and a set of rules for their arrangement into coherent synthetic traces.
Second, two novel segmentation-based loss functions have been developed, which attempt at enforcing the prediction of an exact number of independent structures and at producing closer segmentation boundaries by focusing on a reduced number of samples.
arXiv Detail & Related papers (2021-11-25T10:11:41Z) - Identification of Ischemic Heart Disease by using machine learning
technique based on parameters measuring Heart Rate Variability [50.591267188664666]
In this study, 18 non-invasive features (age, gender, left ventricular ejection fraction and 15 obtained from HRV) of 243 subjects were used to train and validate a series of several ANN.
The best result was obtained using 7 input parameters and 7 hidden nodes with an accuracy of 98.9% and 82% for the training and validation dataset.
arXiv Detail & Related papers (2020-10-29T19:14:41Z) - Interpretable Deep Learning for Automatic Diagnosis of 12-lead
Electrocardiogram [15.464768773761527]
We developed a deep neural network for multi-label classification of cardiac arrhythmias in 12-lead ECG recordings.
The proposed model achieved an average area under the receiver operating characteristic curve (AUC) of 0.970 and an average F1 score of 0.813.
The best-performing leads are lead I, aVR, and V5 among 12 leads.
arXiv Detail & Related papers (2020-10-20T14:51:00Z) - Performance of Dual-Augmented Lagrangian Method and Common Spatial
Patterns applied in classification of Motor-Imagery BCI [68.8204255655161]
Motor-imagery based brain-computer interfaces (MI-BCI) have the potential to become ground-breaking technologies for neurorehabilitation.
Due to the noisy nature of the used EEG signal, reliable BCI systems require specialized procedures for features optimization and extraction.
arXiv Detail & Related papers (2020-10-13T20:50:13Z) - ECG-DelNet: Delineation of Ambulatory Electrocardiograms with Mixed
Quality Labeling Using Neural Networks [69.25956542388653]
Deep learning (DL) algorithms are gaining weight in academic and industrial settings.
We demonstrate DL can be successfully applied to low interpretative tasks by embedding ECG detection and delineation onto a segmentation framework.
The model was trained using PhysioNet's QT database, comprised of 105 ambulatory ECG recordings.
arXiv Detail & Related papers (2020-05-11T16:29:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.