Related papers: ECG-R1: Protocol-Guided and Modality-Agnostic MLLM for Reliable ECG Interpretation

ECG-R1: Protocol-Guided and Modality-Agnostic MLLM for Reliable ECG Interpretation

URL: http://arxiv.org/abs/2602.04279v1
Date: Wed, 04 Feb 2026 07:17:55 GMT
Title: ECG-R1: Protocol-Guided and Modality-Agnostic MLLM for Reliable ECG Interpretation
Authors: Jiarui Jin, Haoyu Wang, Xingliang Wu, Xiaocheng Fang, Xiang Lan, Zihan Wang, Deyun Zhang, Bo Liu, Yingying Zhang, Xian Wu, Hongyan Li, Shenda Hong,
Abstract summary: Existing multimodal large language models (MLLMs) remain unreliable for ECG interpretation.<n>ECG-R1 is the first reasoning MLLM designed for reliable ECG interpretation.<n>Code and data are publicly available at hrefhttp://ai.heartvoice.com.cn/ECG-R1here.
Score: 36.244601234085856
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Electrocardiography (ECG) serves as an indispensable diagnostic tool in clinical practice, yet existing multimodal large language models (MLLMs) remain unreliable for ECG interpretation, often producing plausible but clinically incorrect analyses. To address this, we propose ECG-R1, the first reasoning MLLM designed for reliable ECG interpretation via three innovations. First, we construct the interpretation corpus using \textit{Protocol-Guided Instruction Data Generation}, grounding interpretation in measurable ECG features and monograph-defined quantitative thresholds and diagnostic logic. Second, we present a modality-decoupled architecture with \textit{Interleaved Modality Dropout} to improve robustness and cross-modal consistency when either the ECG signal or ECG image is missing. Third, we present \textit{Reinforcement Learning with ECG Diagnostic Evidence Rewards} to strengthen evidence-grounded ECG interpretation. Additionally, we systematically evaluate the ECG interpretation capabilities of proprietary, open-source, and medical MLLMs, and provide the first quantitative evidence that severe hallucinations are widespread, suggesting that the public should not directly trust these outputs without independent verification. Code and data are publicly available at \href{https://github.com/PKUDigitalHealth/ECG-R1}{here}, and an online platform can be accessed at \href{http://ai.heartvoice.com.cn/ECG-R1/}{here}.

Related papers

Simulator and Experience Enhanced Diffusion Model for Comprehensive ECG Generation [52.19347532840774]
We propose SE-Diff, a novel physiological simulator and experience enhanced diffusion model for ECG generation.<n> SE-Diff integrates a lightweight ordinary differential equation (ODE)-based ECG simulator into the diffusion process via a beat decoder.<n>Extensive experiments on real-world ECG datasets demonstrate that SE-Diff improves both signal fidelity and text-ECG semantic alignment.
arXiv Detail & Related papers (2025-11-13T02:57:10Z)
UniECG: Understanding and Generating ECG in One Unified Model [26.641666246045133]
We propose UniECG, the first unified model for ECG capable of concurrently performing evidence-based ECG interpretation and text-conditioned ECG generation tasks.<n>UniECG can autonomously choose to interpret or generate an ECG based on user input, significantly extending the capability boundaries of current ECG models.
arXiv Detail & Related papers (2025-09-23T03:15:53Z)
ECG-aBcDe: Overcoming Model Dependence, Encoding ECG into a Universal Language for Any LLM [7.632459372363093]
Large Language Models (LLMs) hold significant promise for electrocardiogram (ECG) analysis.<n>Current methods suffer from model-specific ECG encoders, hindering transfer across LLMs.<n>We introduce ECG-aBcDe, a novel encoding method that transforms ECG signals into a universal ECG language readily interpretable by any LLM.
arXiv Detail & Related papers (2025-09-16T03:41:02Z)
EEG-MedRAG: Enhancing EEG-based Clinical Decision-Making via Hierarchical Hypergraph Retrieval-Augmented Generation [45.031633614714]
EEG-MedRAG is a three-layer hypergraph-based retrieval-augmented generation framework.<n>It unifies EEG domain knowledge, individual patient cases, and a large-scale repository into a traversable n-ary relational hypergraph.<n>We introduce the first cross-disease, cross-role EEG clinical QA benchmark, spanning seven disorders and five authentic clinical perspectives.
arXiv Detail & Related papers (2025-08-19T11:12:58Z)
MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and Interpretations [12.00096975933262]
Electrocardiogram (ECG) plays a foundational role in modern cardiovascular care, enabling non-invasive diagnosis of arrhythmias, myocardial ischemia, and conduction disorders.<n>Most existing ECG datasets provide only single-modality data or, at most, dual modalities, making it difficult to build models that can understand and integrate diverse ECG information in real-world settings.<n>We introduce MEETI, the first large-scale ECG dataset that synchronizes raw waveform data, high-resolution plotted images, and detailed textual interpretations generated by large language models.
arXiv Detail & Related papers (2025-07-21T05:32:44Z)
From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining [22.214252217020174]
We introduce MELP, a novel Multi-scale ECG-Language Pretraining (MELP) model that fully leverages hierarchical supervision from ECG-text pairs.<n>We evaluate MELP on three public ECG datasets across multiple tasks, including zero-shot ECG classification, linear probing, and transfer learning.
arXiv Detail & Related papers (2025-06-11T07:22:17Z)
Heartcare Suite: Multi-dimensional Understanding of ECG with Raw Multi-lead Signal Modeling [50.58126509704037]
Heartcare Suite is a framework for fine-grained electrocardiogram (ECG) understanding.<n>Heartcare-220K is a high-quality, structured, and comprehensive multimodal ECG dataset.<n>Heartcare-Bench is a benchmark to guide the optimization of Medical Multimodal Large Language Models (Med-MLLMs) in ECG scenarios.
arXiv Detail & Related papers (2025-06-06T07:56:41Z)
GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images [44.50428701650495]
We introduce GEM, the first MLLM unifying ECG time series, 12-lead ECG images and text for grounded and clinician-aligned ECG interpretation.<n> GEM enables feature-grounded analysis, evidence-driven reasoning, and a clinician-like diagnostic process through three core innovations.<n>We propose the Grounded ECG task, a clinically motivated benchmark designed to assess the MLLM's capability in grounded ECG understanding.
arXiv Detail & Related papers (2025-03-08T05:48:53Z)
Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive Architecture [0.0]
We introduce ECG-JEPA, a self-supervised learning model for 12-lead ECG analysis.<n>It learns semantic representations of ECG data by predicting in the hidden latent space.<n> ECG-JEPA achieves state-of-the-art performance in various downstream tasks including ECG classification and feature prediction.
arXiv Detail & Related papers (2024-10-11T06:30:48Z)
Self-supervised inter-intra period-aware ECG representation learning for detecting atrial fibrillation [41.82319894067087]
We propose an inter-intra period-aware ECG representation learning approach. Considering ECGs of atrial fibrillation patients exhibit the irregularity in RR intervals and the absence of P-waves, we develop specific pre-training tasks for interperiod and intraperiod representations. Our approach demonstrates remarkable AUC performances on the BTCH dataset, textiti.e., 0.953/0.996 for paroxysmal/persistent atrial fibrillation detection.
arXiv Detail & Related papers (2024-10-08T10:03:52Z)
ECG-FM: An Open Electrocardiogram Foundation Model [3.8270632390229777]
We present ECG-FM, an open foundation model for ECG analysis, and conduct a study using a dataset of 1.5 million ECGs.<n>ECG-FM is a transformer-based model pretrained using a hybrid contrastive and generative self-supervised learning approach.<n>We affirm that ECG-FM is robust, label-efficient, and functionally discriminative by showcasing data scaling experiments, performing a latent space analysis, and generating saliency maps.
arXiv Detail & Related papers (2024-08-09T17:06:49Z)
ETP: Learning Transferable ECG Representations via ECG-Text Pre-training [10.856365645831728]
ECG-Text Pre-training (ETP) is an innovative framework designed to learn cross-modal representations that link ECG signals with textual reports. ETP employs an ECG encoder along with a pre-trained language model to align ECG signals with their corresponding textual reports.
arXiv Detail & Related papers (2023-09-06T19:19:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.