Integrating Large Language Models with Human Expertise for Disease Detection in Electronic Health Records
- URL: http://arxiv.org/abs/2504.00053v1
- Date: Mon, 31 Mar 2025 04:19:18 GMT
- Title: Integrating Large Language Models with Human Expertise for Disease Detection in Electronic Health Records
- Authors: Jie Pan, Seungwon Lee, Cheligeer Cheligeer, Elliot A. Martin, Kiarash Riazi, Hude Quan, Na Li,
- Abstract summary: This study developed an efficient strategy based on advanced large language models to identify multiple conditions from EHR clinical notes.<n>We developed a pipeline that leveraged a generative large language model (LLM) to analyze, understand, and interpret EHR notes.
- Score: 6.395493502337635
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Objective: Electronic health records (EHR) are widely available to complement administrative data-based disease surveillance and healthcare performance evaluation. Defining conditions from EHR is labour-intensive and requires extensive manual labelling of disease outcomes. This study developed an efficient strategy based on advanced large language models to identify multiple conditions from EHR clinical notes. Methods: We linked a cardiac registry cohort in 2015 with an EHR system in Alberta, Canada. We developed a pipeline that leveraged a generative large language model (LLM) to analyze, understand, and interpret EHR notes by prompts based on specific diagnosis, treatment management, and clinical guidelines. The pipeline was applied to detect acute myocardial infarction (AMI), diabetes, and hypertension. The performance was compared against clinician-validated diagnoses as the reference standard and widely adopted International Classification of Diseases (ICD) codes-based methods. Results: The study cohort accounted for 3,088 patients and 551,095 clinical notes. The prevalence was 55.4%, 27.7%, 65.9% and for AMI, diabetes, and hypertension, respectively. The performance of the LLM-based pipeline for detecting conditions varied: AMI had 88% sensitivity, 63% specificity, and 77% positive predictive value (PPV); diabetes had 91% sensitivity, 86% specificity, and 71% PPV; and hypertension had 94% sensitivity, 32% specificity, and 72% PPV. Compared with ICD codes, the LLM-based method demonstrated improved sensitivity and negative predictive value across all conditions. The monthly percentage trends from the detected cases by LLM and reference standard showed consistent patterns.
Related papers
- Explainable Admission-Level Predictive Modeling for Prolonged Hospital Stay in Elderly Populations: Challenges in Low- and Middle-Income Countries [65.4286079244589]
Prolonged length of stay (pLoS) is a significant factor associated with the risk of adverse in-hospital events.<n>We develop and explain a predictive model for pLos using admission-level patient and hospital administrative data.
arXiv Detail & Related papers (2026-01-07T23:35:24Z) - Adaptive EEG-based stroke diagnosis with a GRU-TCN classifier and deep Q-learning thresholding [0.0]
We present an adaptive EEG multitask that converts 32-channel signals to power spectral density features (Welch)<n>It uses a recurrent-convolutional network (GRU-TCN) to predict stroke type (healthy, ischemic, hemorrhagic), hemispheric lateralization, and severity, and applies a deep Q-network (DQN) to tune decision thresholds in real time.
arXiv Detail & Related papers (2025-10-28T18:48:48Z) - Evolving Diagnostic Agents in a Virtual Clinical Environment [75.59389103511559]
We present a framework for training large language models (LLMs) as diagnostic agents with reinforcement learning.<n>Our method acquires diagnostic strategies through interactive exploration and outcome-based feedback.<n>DiagAgent significantly outperforms 10 state-of-the-art LLMs, including DeepSeek-v3 and GPT-4o.
arXiv Detail & Related papers (2025-10-28T17:19:47Z) - An Agentic System for Rare Disease Diagnosis with Traceable Reasoning [58.78045864541539]
We introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model (LLM)<n>DeepRare generates ranked diagnostic hypotheses for rare diseases, each accompanied by a transparent chain of reasoning.<n>The system demonstrates exceptional diagnostic performance among 2,919 diseases, achieving 100% accuracy for 1013 diseases.
arXiv Detail & Related papers (2025-06-25T13:42:26Z) - A Multi-Phase Analysis of Blood Culture Stewardship: Machine Learning Prediction, Expert Recommendation Assessment, and LLM Automation [2.25639842999394]
Blood cultures are often over ordered without clear justification.
In study of 135483 emergency department (ED) blood culture orders, we developed machine learning (ML) models to predict the risk of bacteremia.
arXiv Detail & Related papers (2025-04-09T21:12:29Z) - Congenital Heart Disease Classification Using Phonocardiograms: A Scalable Screening Tool for Diverse Environments [34.10187730651477]
Congenital heart disease (CHD) is a critical condition that demands early detection.<n>This study presents a deep learning model designed to detect CHD using phonocardiogram (PCG) signals.<n>We evaluated our model on several datasets, including the primary dataset from Bangladesh.
arXiv Detail & Related papers (2025-03-28T05:47:44Z) - From Knowledge Generation to Knowledge Verification: Examining the BioMedical Generative Capabilities of ChatGPT [45.6537455491436]
Our approach consists of two processes: generating disease-centric associations and verifying these associations.
Using ChatGPT as the selected LLM, we designed prompt-engineering processes to establish linkages between diseases and related drugs, symptoms, and genes.
arXiv Detail & Related papers (2025-02-20T16:39:57Z) - SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research [45.2233252981348]
Large Language Models have shown promising results in their ability to encode general medical knowledge.
We test the ability of state-of-the-art LLMs to leverage their internal knowledge and reasoning for epilepsy diagnosis.
arXiv Detail & Related papers (2024-07-03T11:02:12Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z) - ECG-Based Patient Identification: A Comprehensive Evaluation Across Health and Activity Conditions [0.0]
This paper presents a novel approach for patient identification in healthcare systems using electrocardiogram signals.
A convolutional neural network (CNN) is employed to classify users based on electrocardiomatrices, a specific type of image derived from ECG signals.
The proposed identification system is evaluated in multiple databases, achieving up to 99.84% accuracy on healthy subjects, 97.09% on patients with cardiovascular diseases, and 97.89% on mixed populations including both healthy and arrhythmic patients.
arXiv Detail & Related papers (2023-02-13T17:14:55Z) - Sparse Dynamical Features generation, application to Parkinson's Disease
diagnosis [0.0]
We propose a new approach inspired by the functioning of the brain that uses the dynamics, frequency and temporal content of EEGs to extract new demarcating features of the disease.
The method was evaluated on a publicly available dataset containing EEG signals recorded during a 3-oddball auditory task involving N = 50 subjects, of whom 25 suffer from PD.
arXiv Detail & Related papers (2022-10-20T22:39:29Z) - Preliminary study on the impact of EEG density on TMS-EEG classification
in Alzheimer's disease [48.42347515853289]
We use TMS-evoked EEG responses to classify Alzheimer's patients from healthy controls.
The accuracy, sensitivity and specificity were of 92.7%, 96.58% and 88.2% respectively.
arXiv Detail & Related papers (2022-05-19T20:34:04Z) - Identification of Ischemic Heart Disease by using machine learning
technique based on parameters measuring Heart Rate Variability [50.591267188664666]
In this study, 18 non-invasive features (age, gender, left ventricular ejection fraction and 15 obtained from HRV) of 243 subjects were used to train and validate a series of several ANN.
The best result was obtained using 7 input parameters and 7 hidden nodes with an accuracy of 98.9% and 82% for the training and validation dataset.
arXiv Detail & Related papers (2020-10-29T19:14:41Z) - COMPOSE: Cross-Modal Pseudo-Siamese Network for Patient Trial Matching [70.08786840301435]
We propose CrOss-Modal PseudO-SiamEse network (COMPOSE) to address these challenges for patient-trial matching.
Experiment results show COMPOSE can reach 98.0% AUC on patient-criteria matching and 83.7% accuracy on patient-trial matching.
arXiv Detail & Related papers (2020-06-15T21:01:33Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.