ECG-Expert-QA: A Benchmark for Evaluating Medical Large Language Models in Heart Disease Diagnosis
- URL: http://arxiv.org/abs/2502.17475v3
- Date: Mon, 07 Apr 2025 09:59:44 GMT
- Title: ECG-Expert-QA: A Benchmark for Evaluating Medical Large Language Models in Heart Disease Diagnosis
- Authors: Xu Wang, Jiaju Kang, Puyu Han, Yubao Zhao, Qian Liu, Liwenfei He, Lingqiong Zhang, Lingyun Dai, Yongcheng Wang, Jie Tao,
- Abstract summary: ECG-Expert-QA is a comprehensive dataset for evaluating diagnostic capabilities in electrocardiogram (ECG) interpretation.<n>It combines real-world clinical ECG data with systematically generated synthetic cases, covering 12 essential diagnostic tasks.<n>Key innovation is the support for multi-turn dialogues, enabling the development of conversational medical AI systems.
- Score: 8.059062779882554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present ECG-Expert-QA, a comprehensive multimodal dataset for evaluating diagnostic capabilities in electrocardiogram (ECG) interpretation. It combines real-world clinical ECG data with systematically generated synthetic cases, covering 12 essential diagnostic tasks and totaling 47,211 expert-validated QA pairs. These encompass diverse clinical scenarios, from basic rhythm recognition to complex diagnoses involving rare conditions and temporal changes. A key innovation is the support for multi-turn dialogues, enabling the development of conversational medical AI systems that emulate clinician-patient or interprofessional interactions. This allows for more realistic assessment of AI models' clinical reasoning, diagnostic accuracy, and knowledge integration. Constructed through a knowledge-guided framework with strict quality control, ECG-Expert-QA ensures linguistic and clinical consistency, making it a high-quality resource for advancing AI-assisted ECG interpretation. It challenges models with tasks like identifying subtle ischemic changes and interpreting complex arrhythmias in context-rich scenarios. To promote research transparency and collaboration, the dataset, accompanying code, and prompts are publicly released at https://github.com/Zaozzz/ECG-Expert-QA
Related papers
- MedCoT: Medical Chain of Thought via Hierarchical Expert [48.91966620985221]
This paper presents MedCoT, a novel hierarchical expert verification reasoning chain method.
It is designed to enhance interpretability and accuracy in biomedical imaging inquiries.
Experimental evaluations on four standard Med-VQA datasets demonstrate that MedCoT surpasses existing state-of-the-art approaches.
arXiv Detail & Related papers (2024-12-18T11:14:02Z) - Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking [58.25862290294702]
We present MedChain, a dataset of 12,163 clinical cases that covers five key stages of clinical workflow.<n>We also propose MedChain-Agent, an AI system that integrates a feedback mechanism and a MCase-RAG module to learn from previous cases and adapt its responses.
arXiv Detail & Related papers (2024-12-02T15:25:02Z) - Clinical Evaluation of Medical Image Synthesis: A Case Study in Wireless Capsule Endoscopy [63.39037092484374]
This study focuses on the clinical evaluation of medical Synthetic Data Generation using Artificial Intelligence (AI) models.
The paper contributes by a) presenting a protocol for the systematic evaluation of synthetic images by medical experts and b) applying it to assess TIDE-II, a novel variational autoencoder-based model for high-resolution WCE image synthesis.
The results show that TIDE-II generates clinically relevant WCE images, helping to address data scarcity and enhance diagnostic tools.
arXiv Detail & Related papers (2024-10-31T19:48:50Z) - Electrocardiogram Report Generation and Question Answering via Retrieval-Augmented Self-Supervised Modeling [19.513904491604794]
ECG-ReGen is a retrieval-based approach for ECG-to-text report generation and question answering.
By combining pre-training with dynamic retrieval and Large Language Model (LLM)-based refinement, ECG-ReGen effectively analyzes ECG data and answers related queries.
arXiv Detail & Related papers (2024-09-13T12:50:36Z) - GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI [67.09501109871351]
Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals.
GMAI-MMBench is the most comprehensive general medical AI benchmark with well-categorized data structure and multi-perceptual granularity to date.
It is constructed from 284 datasets across 38 medical image modalities, 18 clinical-related tasks, 18 departments, and 4 perceptual granularities in a Visual Question Answering (VQA) format.
arXiv Detail & Related papers (2024-08-06T17:59:21Z) - medIKAL: Integrating Knowledge Graphs as Assistants of LLMs for Enhanced Clinical Diagnosis on EMRs [13.806201934732321]
medIKAL combines Large Language Models (LLMs) with knowledge graphs (KGs) to enhance diagnostic capabilities.<n> medIKAL assigns weighted importance to entities in medical records based on their type, enabling precise localization of candidate diseases within KGs.<n>We validated medIKAL's effectiveness through extensive experiments on a newly introduced open-sourced Chinese EMR dataset.
arXiv Detail & Related papers (2024-06-20T13:56:52Z) - ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text [14.06147507373525]
This study introduces a new multimodal contrastive pretaining framework that aims to improve the quality and robustness of learned representations of 12-lead ECG signals.
Our framework comprises two key components, including Cardio Query Assistant (CQA) and ECG Semantics Integrator(ESI)
arXiv Detail & Related papers (2024-05-26T06:45:39Z) - Can Generative AI Support Patients' & Caregivers' Informational Needs? Towards Task-Centric Evaluation Of AI Systems [0.7124736158080937]
We develop an evaluation paradigm that centers human understanding and decision-making.
We study the utility of generative AI systems in supporting people in a concrete task.
We evaluate two state-of-the-art generative AI systems against the radiologist's responses.
arXiv Detail & Related papers (2024-01-31T23:24:37Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - Towards the Identifiability and Explainability for Personalized Learner
Modeling: An Inductive Paradigm [36.60917255464867]
We propose an identifiable cognitive diagnosis framework (ID-CDF) based on a novel response-proficiency-response paradigm inspired by encoder-decoder models.
We show that ID-CDF can effectively address the problems without loss of diagnosis preciseness.
arXiv Detail & Related papers (2023-09-01T07:18:02Z) - ECG-QA: A Comprehensive Question Answering Dataset Combined With
Electrocardiogram [12.167108953668464]
ECG-QA is the first dataset specifically designed for ECG analysis.
The dataset comprises a total of 70 question templates that cover a wide range of clinically relevant ECG topics.
Our dataset includes diverse ECG interpretation questions, including those that require a comparative analysis of two different ECGs.
arXiv Detail & Related papers (2023-06-21T07:14:57Z) - Automated Cardiovascular Record Retrieval by Multimodal Learning between
Electrocardiogram and Clinical Report [28.608260758775316]
We introduce a novel approach to ECG interpretation, leveraging recent breakthroughs in Large Language Models (LLMs) and Vision-Transformer (ViT) models.
We propose an alternative method of automatically identifying the most similar clinical cases based on the input ECG data.
Our findings could serve as a crucial resource for providing diagnostic services in underdeveloped regions.
arXiv Detail & Related papers (2023-04-13T06:32:25Z) - Inheritance-guided Hierarchical Assignment for Clinical Automatic
Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making.
We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z) - MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware
Medical Dialogue Generation [86.38736781043109]
We build and release a large-scale high-quality Medical Dialogue dataset related to 12 types of common Gastrointestinal diseases named MedDG.
We propose two kinds of medical dialogue tasks based on MedDG dataset. One is the next entity prediction and the other is the doctor response generation.
Experimental results show that the pre-train language models and other baselines struggle on both tasks with poor performance in our dataset.
arXiv Detail & Related papers (2020-10-15T03:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.