Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) -- a Large Language Model Chatbot for Perioperative Medicine
- URL: http://arxiv.org/abs/2412.18096v1
- Date: Tue, 24 Dec 2024 02:14:13 GMT
- Title: Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) -- a Large Language Model Chatbot for Perioperative Medicine
- Authors: Yu He Ke, Liyuan Jin, Kabilan Elangovan, Bryan Wen Xi Ong, Chin Yang Oh, Jacqueline Sim, Kenny Wei-Tsen Loh, Chai Rick Soh, Jonathan Ming Hua Cheng, Aaron Kwang Yang Lee, Daniel Shu Wei Ting, Nan Liu, Hairil Rizal Abdullah,
- Abstract summary: Large Language Models (LLMs) are emerging as powerful tools in healthcare, particularly for complex, domain-specific tasks.
This study describes the development and evaluation of the PErioperative AI atbot (PEACH), a secure-based system integrated with local guidelines to support clinical decision-making.
- Score: 2.0497272891338536
- License:
- Abstract: Large Language Models (LLMs) are emerging as powerful tools in healthcare, particularly for complex, domain-specific tasks. This study describes the development and evaluation of the PErioperative AI CHatbot (PEACH), a secure LLM-based system integrated with local perioperative guidelines to support preoperative clinical decision-making. PEACH was embedded with 35 institutional perioperative protocols in the secure Claude 3.5 Sonet LLM framework within Pair Chat (developed by Singapore Government) and tested in a silent deployment with real-world data. Accuracy, safety, and usability were assessed. Deviations and hallucinations were categorized based on potential harm, and user feedback was evaluated using the Technology Acceptance Model (TAM). Updates were made after the initial silent deployment to amend one protocol. In 240 real-world clinical iterations, PEACH achieved a first-generation accuracy of 97.5% (78/80) and an overall accuracy of 96.7% (232/240) across three iterations. The updated PEACH demonstrated improved accuracy of 97.9% (235/240), with a statistically significant difference from the null hypothesis of 95% accuracy (p = 0.018, 95% CI: 0.952-0.991). Minimal hallucinations and deviations were observed (both 1/240 and 2/240, respectively). Clinicians reported that PEACH expedited decisions in 95% of cases, and inter-rater reliability ranged from kappa 0.772-0.893 within PEACH and 0.610-0.784 among attendings. PEACH is an accurate, adaptable tool that enhances consistency and efficiency in perioperative decision-making. Future research should explore its scalability across specialties and its impact on clinical outcomes.
Related papers
- Primary Care Diagnoses as a Reliable Predictor for Orthopedic Surgical Interventions [0.10624941710159722]
Referral workflow inefficiencies contribute to suboptimal patient outcomes and higher healthcare costs.
In this study, we investigated the possibility of predicting procedural needs based on primary care diagnostic entries.
arXiv Detail & Related papers (2025-02-06T17:15:12Z) - Leveraging Large Language Models to Enhance Machine Learning Interpretability and Predictive Performance: A Case Study on Emergency Department Returns for Mental Health Patients [2.3769374446083735]
Emergency department (ED) returns for mental health conditions pose a major healthcare burden, with 24-27% of patients returning within 30 days.
To assess whether integrating large language models (LLMs) with machine learning improves predictive accuracy and clinical interpretability of ED mental health return risk models.
arXiv Detail & Related papers (2025-01-21T15:41:20Z) - Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding [44.01429184037945]
We introduce ALIGN, a novel compositional LLM-based system for automated, zero-shot medical coding.
We evaluate ALIGN on harmonizing medication terms into Anatomical Therapeutic Chemical (ATC) and medical history terms into Medical Dictionary for Regulatory Activities (MedDRA) codes.
arXiv Detail & Related papers (2024-11-20T09:59:12Z) - Artificial Intelligence-Based Triaging of Cutaneous Melanocytic Lesions [0.8864540224289991]
Pathologists are facing an increasing workload due to a growing volume of cases and the need for more comprehensive diagnoses.
We developed an artificial intelligence (AI) model for triaging cutaneous melanocytic lesions based on whole slide images.
arXiv Detail & Related papers (2024-10-14T13:49:04Z) - Machine Learning for ALSFRS-R Score Prediction: Making Sense of the Sensor Data [44.99833362998488]
Amyotrophic Lateral Sclerosis (ALS) is a rapidly progressive neurodegenerative disease that presents individuals with limited treatment options.
The present investigation, spearheaded by the iDPP@CLEF 2024 challenge, focuses on utilizing sensor-derived data obtained through an app.
arXiv Detail & Related papers (2024-07-10T19:17:23Z) - Bridging AI and Clinical Practice: Integrating Automated Sleep Scoring
Algorithm with Uncertainty-Guided Physician Review [0.0]
This study aims to enhance the clinical use of automated sleep-scoring algorithms by incorporating an uncertainty estimation approach.
Total of 19578 PSGs from 13 open-access databases were used to train U-Sleep, a state-of-the-art sleep-scoring algorithm.
arXiv Detail & Related papers (2023-12-22T15:58:09Z) - DeepCOVID-Fuse: A Multi-modality Deep Learning Model Fusing Chest
X-Radiographs and Clinical Variables to Predict COVID-19 Risk Levels [8.593516170110203]
DeepCOVID-Fuse is a deep learning fusion model to predict risk levels in coronavirus patients.
The accuracy of DeepCOVID-Fuse trained on CXRs and clinical variables is 0.658, with an AUC of 0.842.
arXiv Detail & Related papers (2023-01-20T20:54:25Z) - Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in
Artificial Intelligence [79.038671794961]
We launch the Unified CT-COVID AI Diagnostic Initiative (UCADI), where the AI model can be distributedly trained and independently executed at each host institution.
Our study is based on 9,573 chest computed tomography scans (CTs) from 3,336 patients collected from 23 hospitals located in China and the UK.
arXiv Detail & Related papers (2021-11-18T00:43:41Z) - Ensemble model for pre-discharge icd10 coding prediction [45.82374977939355]
We propose an ensemble model incorporating multiple clinical data sources for accurate code predictions.
We obtain multi-label classification accuracies of 0.73 and 0.58 for average precision, 0.56 and 0.35 for F1-scores and 0.71 and 0.4 accuracy in predicting principal diagnosis for inpatient and outpatient datasets respectively.
arXiv Detail & Related papers (2020-12-16T07:02:56Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Automated Quantification of CT Patterns Associated with COVID-19 from
Chest CT [48.785596536318884]
The proposed method takes as input a non-contrasted chest CT and segments the lesions, lungs, and lobes in three dimensions.
The method outputs two combined measures of the severity of lung and lobe involvement, quantifying both the extent of COVID-19 abnormalities and presence of high opacities.
Evaluation of the algorithm is reported on CTs of 200 participants (100 COVID-19 confirmed patients and 100 healthy controls) from institutions from Canada, Europe and the United States.
arXiv Detail & Related papers (2020-04-02T21:49:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.