DiagECG: An LLM-Driven Framework for Diagnostic Reasoning via Discretized ECG Tokenization
- URL: http://arxiv.org/abs/2508.15338v1
- Date: Thu, 21 Aug 2025 08:13:37 GMT
- Title: DiagECG: An LLM-Driven Framework for Diagnostic Reasoning via Discretized ECG Tokenization
- Authors: Jinning Yang, Wen Shi,
- Abstract summary: We present DiagECG, a novel framework that integrates time-series and language modeling.<n>Our approach discretizes continuous ECG embeddings into symbolic tokens using a lead-independent encoder and quantization module.
- Score: 0.7550566004119158
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Electrocardiography plays a central role in cardiovascular diagnostics, yet existing automated approaches often struggle to generalize across clinical tasks and offer limited support for open-ended reasoning. We present DiagECG, a novel framework that integrates time-series and language modeling by enabling large language models to process 12-lead ECG signals for clinical text generation tasks. Our approach discretizes continuous ECG embeddings into symbolic tokens using a lead-independent encoder and quantization module. These tokens are then used to extend the vocabulary of LLM, allowing the model to handle both ECG and natural language inputs in a unified manner. To bridge the modality gap, we pretrain the model on an autoregressive ECG forecasting task, enabling the LLM to model temporal dynamics using its native language modeling capabilities. Finally, we perform instruction tuning on both ECG question answering and diagnostic report generation. Without modifying the core model, DiagECG achieves strong performance across tasks while maintaining generalization to out-of-distribution settings. Extensive experiments demonstrate the effectiveness of each component and highlight the potential of integrating symbolic ECG representations into LLMs for medical reasoning.
Related papers
- EnECG: Efficient Ensemble Learning for Electrocardiogram Multi-task Foundation Model [46.84040404474695]
EnECG is an ensemble-based framework that integrates multiple specialized foundation models, each excelling in different aspects of ECG interpretation.<n>We show that EnECG can help reduce computational and memory costs while maintaining the strong representational power of foundation models.<n>This framework not only enhances feature extraction and predictive performance but also ensures practical efficiency for real-world clinical applications.
arXiv Detail & Related papers (2025-11-28T07:22:33Z) - MIRNet: Integrating Constrained Graph-Based Reasoning with Pre-training for Diagnostic Medical Imaging [67.74482877175797]
MIRNet is a novel framework that integrates self-supervised pre-training with constrained graph-based reasoning.<n>We introduce TongueAtlas-4K, a benchmark comprising 4,000 images annotated with 22 diagnostic labels.
arXiv Detail & Related papers (2025-11-13T06:30:41Z) - Simulator and Experience Enhanced Diffusion Model for Comprehensive ECG Generation [52.19347532840774]
We propose SE-Diff, a novel physiological simulator and experience enhanced diffusion model for ECG generation.<n> SE-Diff integrates a lightweight ordinary differential equation (ODE)-based ECG simulator into the diffusion process via a beat decoder.<n>Extensive experiments on real-world ECG datasets demonstrate that SE-Diff improves both signal fidelity and text-ECG semantic alignment.
arXiv Detail & Related papers (2025-11-13T02:57:10Z) - WaveMind: Towards a Conversational EEG Foundation Model Aligned to Textual and Visual Modalities [55.00677513249723]
EEG signals simultaneously encode both cognitive processes and intrinsic neural states.<n>We map EEG signals and their corresponding modalities into a unified semantic space to achieve generalized interpretation.<n>The resulting model demonstrates robust classification accuracy while supporting flexible, open-ended conversations.
arXiv Detail & Related papers (2025-09-26T06:21:51Z) - From Token to Rhythm: A Multi-Scale Approach for ECG-Language Pretraining [22.214252217020174]
We introduce MELP, a novel Multi-scale ECG-Language Pretraining (MELP) model that fully leverages hierarchical supervision from ECG-text pairs.<n>We evaluate MELP on three public ECG datasets across multiple tasks, including zero-shot ECG classification, linear probing, and transfer learning.
arXiv Detail & Related papers (2025-06-11T07:22:17Z) - Heartcare Suite: Multi-dimensional Understanding of ECG with Raw Multi-lead Signal Modeling [50.58126509704037]
Heartcare Suite is a framework for fine-grained electrocardiogram (ECG) understanding.<n>Heartcare-220K is a high-quality, structured, and comprehensive multimodal ECG dataset.<n>Heartcare-Bench is a benchmark to guide the optimization of Medical Multimodal Large Language Models (Med-MLLMs) in ECG scenarios.
arXiv Detail & Related papers (2025-06-06T07:56:41Z) - PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing [49.243031514520794]
Large Language Models (LLMs) excel at capturing long-range signals due to their text-centric design.<n>PhysLLM achieves state-the-art accuracy and robustness, demonstrating superior generalization across lighting variations and motion scenarios.
arXiv Detail & Related papers (2025-05-06T15:18:38Z) - A CNN-based Local-Global Self-Attention via Averaged Window Embeddings for Hierarchical ECG Analysis [1.0844302367985357]
We propose a novel Local-Global Attention ECG model (LGA-ECG) to address this limitation.<n>Our approach extracts queries by averaging embeddings obtained from overlapping convolutional windows.<n> Experiments conducted on the CODE-15 dataset demonstrate that LGA-ECG outperforms state-of-the-art models.
arXiv Detail & Related papers (2025-04-13T01:21:18Z) - GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images [43.65650710265957]
We introduce GEM, the first MLLM unifying ECG time series, 12-lead ECG images and text for grounded and clinician-aligned ECG interpretation.<n> GEM enables feature-grounded analysis, evidence-driven reasoning, and a clinician-like diagnostic process through three core innovations.<n>We propose the Grounded ECG task, a clinically motivated benchmark designed to assess the MLLM's capability in grounded ECG understanding.
arXiv Detail & Related papers (2025-03-08T05:48:53Z) - ECG-Byte: A Tokenizer for End-to-End Generative Electrocardiogram Language Modeling [20.484166589932702]
Large Language Models (LLMs) have demonstrated exceptional versatility across domains, including applications to electrocardiograms (ECGs)<n>We propose ECG-Byte, an adapted byte pair encoding (BPE) tokenizer pipeline for autoregressive language modeling of ECGs.<n>We achieve competitive NLG performance while training 3 times faster and using just 48% of the data required by traditional two-stage methods.
arXiv Detail & Related papers (2024-12-18T22:13:21Z) - Electrocardiogram-Language Model for Few-Shot Question Answering with Meta Learning [19.513904491604794]
Electrocardiogram (ECG) interpretation requires specialized expertise.<n>This work introduces a novel multimodal meta-learning method for few-shot ECG question answering.
arXiv Detail & Related papers (2024-10-18T13:48:01Z) - MEIT: Multimodal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation [28.35107188450758]
Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions.<n>Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation.<n>We propose the Multimodal ECG Instruction Tuning (MEIT) framework, the first attempt to tackle ECG report generation with LLMs and multimodal instructions.
arXiv Detail & Related papers (2024-03-07T23:20:56Z) - ECG-CL: A Comprehensive Electrocardiogram Interpretation Method Based on
Continual Learning [20.465733855762835]
Electrocardiogram (ECG) monitoring is one of the most powerful technique of cardiovascular disease (CVD) early identification.
Classic rule-based algorithms are now completely outperformed by deep learning based methods.
We propose a multi-resolution model that can sustain high-resolution low-level semantic information throughout.
arXiv Detail & Related papers (2023-04-10T15:19:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.