From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining
- URL: http://arxiv.org/abs/2506.21803v1
- Date: Wed, 11 Jun 2025 07:22:17 GMT
- Title: From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining
- Authors: Fuying Wang, Jiacheng Xu, Lequan Yu,
- Abstract summary: We introduce MELP, a novel Multi-scale ECG-Language Pretraining (MELP) model that fully leverages hierarchical supervision from ECG-text pairs.<n>We evaluate MELP on three public ECG datasets across multiple tasks, including zero-shot ECG classification, linear probing, and transfer learning.
- Score: 22.214252217020174
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Electrocardiograms (ECGs) play a vital role in monitoring cardiac health and diagnosing heart diseases. However, traditional deep learning approaches for ECG analysis rely heavily on large-scale manual annotations, which are both time-consuming and resource-intensive to obtain. To overcome this limitation, self-supervised learning (SSL) has emerged as a promising alternative, enabling the extraction of robust ECG representations that can be efficiently transferred to various downstream tasks. While previous studies have explored SSL for ECG pretraining and multi-modal ECG-language alignment, they often fail to capture the multi-scale nature of ECG signals. As a result, these methods struggle to learn generalized representations due to their inability to model the hierarchical structure of ECG data. To address this gap, we introduce MELP, a novel Multi-scale ECG-Language Pretraining (MELP) model that fully leverages hierarchical supervision from ECG-text pairs. MELP first pretrains a cardiology-specific language model to enhance its understanding of clinical text. It then applies three levels of cross-modal supervision-at the token, beat, and rhythm levels-to align ECG signals with textual reports, capturing structured information across different time scales. We evaluate MELP on three public ECG datasets across multiple tasks, including zero-shot ECG classification, linear probing, and transfer learning. Experimental results demonstrate that MELP outperforms existing SSL methods, underscoring its effectiveness and adaptability across diverse clinical applications. Our code is available at https://github.com/HKU-MedAI/MELP.
Related papers
- Global and Local Contrastive Learning for Joint Representations from Cardiac MRI and ECG [40.407824759778784]
PTACL (Patient and Temporal Alignment Contrastive Learning) is a multimodal contrastive learning framework that enhances ECG representations by integrating-temporal information from CMR.<n>We evaluate PTACL on paired ECG-CMR data from 27,951 subjects in the UK Biobank.<n>Our results highlight the potential of PTACL to enhance non-invasive cardiac diagnostics using ECG.
arXiv Detail & Related papers (2025-06-24T17:19:39Z) - Heartcare Suite: Multi-dimensional Understanding of ECG with Raw Multi-lead Signal Modeling [50.58126509704037]
Heartcare Suite is a framework for fine-grained electrocardiogram (ECG) understanding.<n>Heartcare-220K is a high-quality, structured, and comprehensive multimodal ECG dataset.<n>Heartcare-Bench is a benchmark to guide the optimization of Medical Multimodal Large Language Models (Med-MLLMs) in ECG scenarios.
arXiv Detail & Related papers (2025-06-06T07:56:41Z) - GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images [43.65650710265957]
We introduce GEM, the first MLLM unifying ECG time series, 12-lead ECG images and text for grounded and clinician-aligned ECG interpretation.<n> GEM enables feature-grounded analysis, evidence-driven reasoning, and a clinician-like diagnostic process through three core innovations.<n>We propose the Grounded ECG task, a clinically motivated benchmark designed to assess the MLLM's capability in grounded ECG understanding.
arXiv Detail & Related papers (2025-03-08T05:48:53Z) - Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model [25.131870247201636]
We introduce a novel perspective on ECG signals, treating heartbeats as words and rhythms as sentences.<n>We then propose HeartLang, a novel self-supervised learning framework for ECG language processing.<n>We construct the largest heartbeat-based ECG vocabulary to date, which will further advance the development of ECG language processing.
arXiv Detail & Related papers (2025-02-15T07:40:57Z) - ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text [14.06147507373525]
This study introduces a new multimodal contrastive pretaining framework that aims to improve the quality and robustness of learned representations of 12-lead ECG signals.
Our framework comprises two key components, including Cardio Query Assistant (CQA) and ECG Semantics Integrator(ESI)
arXiv Detail & Related papers (2024-05-26T06:45:39Z) - MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation [41.324530807795256]
Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions.
Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation.
We propose the Multimodal ECG Instruction Tuning (MEIT) framework, the first attempt to tackle ECG report generation with LLMs and multimodal instructions.
arXiv Detail & Related papers (2024-03-07T23:20:56Z) - ECG-SL: Electrocardiogram(ECG) Segment Learning, a deep learning method
for ECG signal [19.885905393439014]
We propose a novel ECG-Segment based Learning (ECG-SL) framework to explicitly model the periodic nature of ECG signals.
Based on the structural features, a temporal model is designed to learn the temporal information for various clinical tasks.
The proposed method outperforms the baseline model and shows competitive performances compared with task-specific methods in three clinical applications.
arXiv Detail & Related papers (2023-10-01T23:17:55Z) - ETP: Learning Transferable ECG Representations via ECG-Text Pre-training [10.856365645831728]
ECG-Text Pre-training (ETP) is an innovative framework designed to learn cross-modal representations that link ECG signals with textual reports.
ETP employs an ECG encoder along with a pre-trained language model to align ECG signals with their corresponding textual reports.
arXiv Detail & Related papers (2023-09-06T19:19:26Z) - PulseNet: Deep Learning ECG-signal classification using random
augmentation policy and continous wavelet transform for canines [46.09869227806991]
evaluating canine electrocardiograms (ECG) require skilled veterinarians.
Current availability of veterinary cardiologists for ECG interpretation and diagnostic support is limited.
We implement a deep convolutional neural network (CNN) approach for classifying canine electrocardiogram sequences as either normal or abnormal.
arXiv Detail & Related papers (2023-05-17T09:06:39Z) - Frozen Language Model Helps ECG Zero-Shot Learning [12.974685769614062]
We propose Multimodal ECG-Text Self-supervised pre-training (METS)
We use a trainable ECG encoder and a frozen language model to embed paired ECG and automatically machine-generated clinical reports separately.
In downstream classification tasks, METS achieves around 10% improvement in performance without using any annotated data.
arXiv Detail & Related papers (2023-03-22T05:01:14Z) - Generalizing electrocardiogram delineation: training convolutional
neural networks with synthetic data augmentation [63.51064808536065]
Existing databases for ECG delineation are small, being insufficient in size and in the array of pathological conditions they represent.
This article delves has two main contributions. First, a pseudo-synthetic data generation algorithm was developed, based in probabilistically composing ECG traces given "pools" of fundamental segments, as cropped from the original databases, and a set of rules for their arrangement into coherent synthetic traces.
Second, two novel segmentation-based loss functions have been developed, which attempt at enforcing the prediction of an exact number of independent structures and at producing closer segmentation boundaries by focusing on a reduced number of samples.
arXiv Detail & Related papers (2021-11-25T10:11:41Z) - ECG-DelNet: Delineation of Ambulatory Electrocardiograms with Mixed
Quality Labeling Using Neural Networks [69.25956542388653]
Deep learning (DL) algorithms are gaining weight in academic and industrial settings.
We demonstrate DL can be successfully applied to low interpretative tasks by embedding ECG detection and delineation onto a segmentation framework.
The model was trained using PhysioNet's QT database, comprised of 105 ambulatory ECG recordings.
arXiv Detail & Related papers (2020-05-11T16:29:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.