OPTIC: Optimizing Patient-Provider Triaging & Improving Communications in Clinical Operations using GPT-4 Data Labeling and Model Distillation
- URL: http://arxiv.org/abs/2503.05701v1
- Date: Wed, 05 Feb 2025 05:49:34 GMT
- Title: OPTIC: Optimizing Patient-Provider Triaging & Improving Communications in Clinical Operations using GPT-4 Data Labeling and Model Distillation
- Authors: Alberto Santamaria-Pang, Frank Tuan, Ross Campbell, Cindy Zhang, Ankush Jindal, Roopa Surapur, Brad Holloman, Deanna Hanisch, Rae Buckley, Carisa Cooney, Ivan Tarapov, Kimberly S. Peairs, Brian Hasselfeld, Peter Greene,
- Abstract summary: This study seeks to develop an efficient tool for message triaging to reduce physician workload and improve patient-provider communication.<n>We developed OPTIC, a powerful message triaging tool that utilizes GPT-4 for data labeling and BERT for model distillation.<n>The BERT model achieved 88.85% accuracy on the test set validated by GPT-4 labeling, with a sensitivity of 88.29%, specificity of 89.38%, and an F1 score of 0.8842.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The COVID-19 pandemic has accelerated the adoption of telemedicine and patient messaging through electronic medical portals (patient medical advice requests, or PMARs). While these platforms enhance patient access to healthcare, they have also increased the burden on healthcare providers due to the surge in PMARs. This study seeks to develop an efficient tool for message triaging to reduce physician workload and improve patient-provider communication. We developed OPTIC (Optimizing Patient-Provider Triaging & Improving Communications in Clinical Operations), a powerful message triaging tool that utilizes GPT-4 for data labeling and BERT for model distillation. The study used a dataset of 405,487 patient messaging encounters from Johns Hopkins Medicine between January and June 2020. High-quality labeled data was generated through GPT-4-based prompt engineering, which was then used to train a BERT model to classify messages as "Admin" or "Clinical." The BERT model achieved 88.85% accuracy on the test set validated by GPT-4 labeling, with a sensitivity of 88.29%, specificity of 89.38%, and an F1 score of 0.8842. BERTopic analysis identified 81 distinct topics within the test data, with over 80% accuracy in classifying 58 topics. The system was successfully deployed through Epic's Nebula Cloud Platform, demonstrating its practical effectiveness in healthcare settings.
Related papers
- A Clinician-Friendly Platform for Ophthalmic Image Analysis Without Technical Barriers [51.45596445363302]
GlobeReady is a clinician-friendly AI platform that enables ocular disease diagnosis without retraining/fine-tuning or technical expertise.
It achieves high accuracy across imaging modalities: 93.9-98.5% for an 11-category fundus photo dataset and 87.2-92.7% for a 15-category OCT dataset.
It addresses domain shifts across centers and populations, reaching an average accuracy of 88.9% across five centers in China, 86.3% in Vietnam, and 90.2% in the UK.
arXiv Detail & Related papers (2025-04-22T14:17:22Z) - Evaluating the Feasibility and Accuracy of Large Language Models for Medical History-Taking in Obstetrics and Gynecology [4.48731404829722]
Effective physician-patient communications are critical but consume a lot of time and, therefore, cause clinic to become inefficient.
Recent advancements in Large Language Models (LLMs) offer a potential solution for automating medical history-taking and improving diagnostic accuracy.
An AI-driven conversational system was developed to simulate physician-patient interactions with ChatGPT-4o and ChatGPT-4o-mini.
Both models demonstrated strong feasibility in automating infertility history-taking, with ChatGPT-4o-mini excelling in completeness and extraction accuracy.
arXiv Detail & Related papers (2025-03-31T14:09:53Z) - A Scalable Approach to Benchmarking the In-Conversation Differential Diagnostic Accuracy of a Health AI [0.0]
This study introduces a scalable benchmarking methodology for assessing health AI systems.<n>Our methodology employs 400 validated clinical vignettes across 14 medical specialties, using AI-powered patient actors to simulate realistic clinical interactions.<n>August achieved a top-one diagnostic accuracy of 81.8% (327/400 cases) and a top-two accuracy of 85.0% (340/400 cases), significantly outperforming traditional symptom checkers.
arXiv Detail & Related papers (2024-12-17T05:02:33Z) - How Well Can Modern LLMs Act as Agent Cores in Radiology Environments? [54.36730060680139]
RadA-BenchPlat is an evaluation platform that benchmarks the performance of large language models (LLMs) in radiology environments.
The platform also defines ten categories of tools for agent-driven task solving and evaluates seven leading LLMs.
arXiv Detail & Related papers (2024-12-12T18:20:16Z) - Novel Clinical-Grade Prostate Cancer Detection and Grading Model: Development and Prospective Validation Using Real World Data, with Performance Assessment on IHC Requested Cases [1.9729379339863824]
We investigated the performance of an institutionally developed system for prostate cancer detection, grading, and workflow optimization.
We developed models for cancer detection, grading, and screening of equivocal cases for IHC ordering.
arXiv Detail & Related papers (2024-10-31T05:29:18Z) - MIMIC-IV-Ext-PE: Using a large language model to predict pulmonary embolism phenotype in the MIMIC-IV dataset [0.0]
Pulmonary embolism is a leading cause of preventable in-hospital mortality.
There are few large publicly available datasets that contain PE labels for research.
We extracted all available radiology reports ofCTPA scans and two physicians manually labeled the results as PE positive (acute PE) or PE negative.
We applied a previously finetuned Bio_ClinicalBERT transformer language model, VTE-BERT, to extract labels automatically.
arXiv Detail & Related papers (2024-10-29T19:28:44Z) - AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow [33.8495939261319]
We develop an advanced simulated patient system with AIPatient Knowledge Graph (AIPatient KG) as the input and Reasoning Retrieval-Augmented Generation (Reasoning RAG) as the generation backbone.
Reasoning RAG leverages six LLM powered agents spanning tasks including retrieval, KG query generation, abstraction, checker, rewrite, and summarization.
Our system also presents high readability (median Flesch Reading Ease 77.23; median Flesch Kincaid Grade 5.6), robustness (ANOVA F-value 0.6126, p>0.1), and stability (ANOVA F-value 0.782, p>0.1)
arXiv Detail & Related papers (2024-09-27T17:17:15Z) - STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical Question-Answering [58.79671189792399]
STLLaVA-Med is designed to train a policy model capable of auto-generating medical visual instruction data.
We validate the efficacy and data efficiency of STLLaVA-Med across three major medical Visual Question Answering (VQA) benchmarks.
arXiv Detail & Related papers (2024-06-28T15:01:23Z) - MedAlign: A Clinician-Generated Dataset for Instruction Following with
Electronic Medical Records [60.35217378132709]
Large language models (LLMs) can follow natural language instructions with human-level fluency.
evaluating LLMs on realistic text generation tasks for healthcare remains challenging.
We introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data.
arXiv Detail & Related papers (2023-08-27T12:24:39Z) - BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [68.39821375903591]
Generalist AI holds the potential to address limitations due to its versatility in interpreting different data types.
Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model.
arXiv Detail & Related papers (2023-05-26T17:14:43Z) - Evaluation of GPT and BERT-based models on identifying protein-protein
interactions in biomedical text [1.3923237289777164]
Pre-trained language models, such as generative pre-trained transformers (GPT) and bidirectional encoder representations from transformers (BERT), have shown promising results in natural language processing (NLP) tasks.
We evaluated the performance of PPI identification of multiple GPT and BERT models using three manually curated gold-standard corpora.
arXiv Detail & Related papers (2023-03-30T22:06:10Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine [68.7814360102644]
We propose the Re$3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning.
We demonstrate the effectiveness of our method in generating patient discharge instructions.
arXiv Detail & Related papers (2022-10-23T16:34:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.