AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow
- URL: http://arxiv.org/abs/2409.18924v2
- Date: Tue, 1 Oct 2024 17:49:00 GMT
- Title: AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow
- Authors: Huizi Yu, Jiayan Zhou, Lingyao Li, Shan Chen, Jack Gallifant, Anye Shi, Xiang Li, Wenyue Hua, Mingyu Jin, Guang Chen, Yang Zhou, Zhao Li, Trisha Gupte, Ming-Li Chen, Zahra Azizi, Yongfeng Zhang, Themistocles L. Assimes, Xin Ma, Danielle S. Bitterman, Lin Lu, Lizhou Fan,
- Abstract summary: We develop an advanced simulated patient system with AIPatient Knowledge Graph (AIPatient KG) as the input and Reasoning Retrieval-Augmented Generation (Reasoning RAG) as the generation backbone.
Reasoning RAG leverages six LLM powered agents spanning tasks including retrieval, KG query generation, abstraction, checker, rewrite, and summarization.
Our system also presents high readability (median Flesch Reading Ease 77.23; median Flesch Kincaid Grade 5.6), robustness (ANOVA F-value 0.6126, p>0.1), and stability (ANOVA F-value 0.782, p>0.1)
- Score: 33.8495939261319
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Simulated patient systems play a crucial role in modern medical education and research, providing safe, integrative learning environments and enabling clinical decision-making simulations. Large Language Models (LLM) could advance simulated patient systems by replicating medical conditions and patient-doctor interactions with high fidelity and low cost. However, ensuring the effectiveness and trustworthiness of these systems remains a challenge, as they require a large, diverse, and precise patient knowledgebase, along with a robust and stable knowledge diffusion to users. Here, we developed AIPatient, an advanced simulated patient system with AIPatient Knowledge Graph (AIPatient KG) as the input and the Reasoning Retrieval-Augmented Generation (Reasoning RAG) agentic workflow as the generation backbone. AIPatient KG samples data from Electronic Health Records (EHRs) in the Medical Information Mart for Intensive Care (MIMIC)-III database, producing a clinically diverse and relevant cohort of 1,495 patients with high knowledgebase validity (F1 0.89). Reasoning RAG leverages six LLM powered agents spanning tasks including retrieval, KG query generation, abstraction, checker, rewrite, and summarization. This agentic framework reaches an overall accuracy of 94.15% in EHR-based medical Question Answering (QA), outperforming benchmarks that use either no agent or only partial agent integration. Our system also presents high readability (median Flesch Reading Ease 77.23; median Flesch Kincaid Grade 5.6), robustness (ANOVA F-value 0.6126, p>0.1), and stability (ANOVA F-value 0.782, p>0.1). The promising performance of the AIPatient system highlights its potential to support a wide range of applications, including medical education, model evaluation, and system integration.
Related papers
- Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs)
We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets.
Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z) - Detecting Bias and Enhancing Diagnostic Accuracy in Large Language Models for Healthcare [0.2302001830524133]
Biased AI-generated medical advice and misdiagnoses can jeopardize patient safety.
This study introduces new resources designed to promote ethical and precise AI in healthcare.
arXiv Detail & Related papers (2024-10-09T06:00:05Z) - Optimizing Mortality Prediction for ICU Heart Failure Patients: Leveraging XGBoost and Advanced Machine Learning with the MIMIC-III Database [1.5186937600119894]
Heart failure affects millions of people worldwide, significantly reducing quality of life and leading to high mortality rates.
Despite extensive research, the relationship between heart failure and mortality rates among ICU patients is not fully understood.
This study analyzed data from 1,177 patients over 18 years old from the MIMIC-III database, identified using ICD-9 codes.
arXiv Detail & Related papers (2024-09-03T07:57:08Z) - Enhanced Prediction of Ventilator-Associated Pneumonia in Patients with Traumatic Brain Injury Using Advanced Machine Learning Techniques [0.0]
Ventilator-associated pneumonia (VAP) in traumatic brain injury (TBI) patients poses a significant mortality risk.
Timely detection and prognostication of VAP in TBI patients are crucial to improve patient outcomes and alleviate the strain on healthcare resources.
We implemented six machine learning models using the MIMIC-III database.
arXiv Detail & Related papers (2024-08-02T09:44:18Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - The Potential of Wearable Sensors for Assessing Patient Acuity in
Intensive Care Unit (ICU) [12.359907390320453]
Acuity assessments are vital in critical care settings to provide timely interventions and fair resource allocation.
Traditional acuity scores do not incorporate granular information such as patients' mobility level, which can indicate recovery or deterioration in the ICU.
In this study, we evaluated the impact of integrating mobility data collected from wrist-worn accelerometers with clinical data obtained from EHR for developing an AI-driven acuity assessment score.
arXiv Detail & Related papers (2023-11-03T21:52:05Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z) - EVA: Generating Longitudinal Electronic Health Records Using Conditional
Variational Autoencoders [34.22731849545798]
We propose EHR Variational Autoencoder (EVA) for synthesizing sequences of discrete EHR encounters and encounter features.
We illustrate that EVA can produce realistic sequences, account for individual differences among patients, and can be conditioned on specific disease conditions.
We assess the utility of the methods on large real-world EHR repositories containing over 250, 000 patients.
arXiv Detail & Related papers (2020-12-18T02:37:49Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.