Generalization in Healthcare AI: Evaluation of a Clinical Large Language
Model
- URL: http://arxiv.org/abs/2402.10965v2
- Date: Sat, 24 Feb 2024 13:17:38 GMT
- Title: Generalization in Healthcare AI: Evaluation of a Clinical Large Language
Model
- Authors: Salman Rahman, Lavender Yao Jiang, Saadia Gabriel, Yindalon
Aphinyanaphongs, Eric Karl Oermann and Rumi Chunara
- Abstract summary: Large language models (LLMs) provide opportunities in healthcare for improved patient care, clinical decision-making, and enhancement of physician and administrator.
The potential of these models importantly depends on their ability to generalize effectively across clinical environments and populations.
We evaluated ClinicLLM, an LLM trained on [HOSPITAL]'s clinical notes, analyzing its performance on 30-day all-cause readmission prediction.
- Score: 9.83029774375588
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advances in large language models (LLMs) provide new opportunities in
healthcare for improved patient care, clinical decision-making, and enhancement
of physician and administrator workflows. However, the potential of these
models importantly depends on their ability to generalize effectively across
clinical environments and populations, a challenge often underestimated in
early development. To better understand reasons for these challenges and inform
mitigation approaches, we evaluated ClinicLLM, an LLM trained on [HOSPITAL]'s
clinical notes, analyzing its performance on 30-day all-cause readmission
prediction focusing on variability across hospitals and patient
characteristics. We found poorer generalization particularly in hospitals with
fewer samples, among patients with government and unspecified insurance, the
elderly, and those with high comorbidities. To understand reasons for lack of
generalization, we investigated sample sizes for fine-tuning, note content
(number of words per note), patient characteristics (comorbidity level, age,
insurance type, borough), and health system aspects (hospital, all-cause 30-day
readmission, and mortality rates). We used descriptive statistics and
supervised classification to identify features. We found that, along with
sample size, patient age, number of comorbidities, and the number of words in
notes are all important factors related to generalization. Finally, we compared
local fine-tuning (hospital specific), instance-based augmented fine-tuning and
cluster-based fine-tuning for improving generalization. Among these, local
fine-tuning proved most effective, increasing AUC by 0.25% to 11.74% (most
helpful in settings with limited data). Overall, this study provides new
insights for enhancing the deployment of large language models in the
societally important domain of healthcare, and improving their performance for
broader populations.
Related papers
- MIL vs. Aggregation: Evaluating Patient-Level Survival Prediction Strategies Using Graph-Based Learning [52.231128973251124]
We compare various strategies for predicting survival at the WSI and patient level.
The former treats each WSI as an independent sample, mimicking the strategy adopted in other works.
The latter comprises methods to either aggregate the predictions of the several WSIs or automatically identify the most relevant slide.
arXiv Detail & Related papers (2025-03-29T11:14:02Z) - Patients Speak, AI Listens: LLM-based Analysis of Online Reviews Uncovers Key Drivers for Urgent Care Satisfaction [26.391629320279904]
This study collects Google Maps reviews across the DMV and Florida areas.
We first analyze the geospatial patterns of various aspects, including population density, median income, GINI Index, rent-to-income ratio, household below poverty rate, no insurance rate, and unemployment rate.
arXiv Detail & Related papers (2025-03-26T20:45:01Z) - Enhancing In-Hospital Mortality Prediction Using Multi-Representational Learning with LLM-Generated Expert Summaries [3.5508427067904864]
In-hospital mortality (IHM) prediction for ICU patients is critical for timely interventions and efficient resource allocation.
This study integrates structured physiological data and clinical notes with Large Language Model (LLM)-generated expert summaries to improve IHM prediction accuracy.
arXiv Detail & Related papers (2024-11-25T16:36:38Z) - Predicting 30-Day Hospital Readmission in Medicare Patients: Insights from an LSTM Deep Learning Model [4.918444397807014]
This study analyzes Medicare hospital readmissions using LSTM networks with feature engineering to assess feature contributions.
The LSTM model is designed to capture temporal dynamics from admission-level and patient-level data.
The major features were the Charlson Comorbidity Index, hospital length of stay, the hospital admissions over the past 6 months, while demographic variables were less impactful.
arXiv Detail & Related papers (2024-10-23T03:50:32Z) - Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval [61.70489848327436]
KARE is a novel framework that integrates knowledge graph (KG) community-level retrieval with large language models (LLMs) reasoning.
Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions.
arXiv Detail & Related papers (2024-10-06T18:46:28Z) - Towards Personalised Patient Risk Prediction Using Temporal Hospital Data Trajectories [0.9545101073027095]
We propose a pipeline that groups intensive care unit patients by the trajectories of observations data throughout their stay.
Applying the pipeline to data from just the first four hours of each ICU stay assigns the majority of patients to the same cluster as when the entire stay duration is considered.
arXiv Detail & Related papers (2024-07-12T15:53:26Z) - Predictive and Prescriptive Analytics for Multi-Site Modeling of Frail
and Elderly Patient Services [0.0]
The aim of this research is to assess how various predictive and prescriptive analytical methods contribute to addressing the operational challenges within an area of healthcare that is growing in demand.
On the prescriptive side, deterministic and two-stage programs are developed to determine how to optimally plan for beds and ward staff.
Our research reveals that healthcare managers should consider using predictive and prescriptive models to make more informed decisions.
arXiv Detail & Related papers (2023-11-13T12:25:45Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z) - SPeC: A Soft Prompt-Based Calibration on Performance Variability of
Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization.
Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv Detail & Related papers (2023-03-23T04:47:46Z) - SANSformers: Self-Supervised Forecasting in Electronic Health Records
with Attention-Free Models [48.07469930813923]
This work aims to forecast the demand for healthcare services, by predicting the number of patient visits to healthcare facilities.
We introduce SANSformer, an attention-free sequential model designed with specific inductive biases to cater for the unique characteristics of EHR data.
Our results illuminate the promising potential of tailored attention-free models and self-supervised pretraining in refining healthcare utilization predictions across various patient demographics.
arXiv Detail & Related papers (2021-08-31T08:23:56Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z) - Using Deep Learning and Explainable Artificial Intelligence in Patients'
Choices of Hospital Levels [10.985001960872264]
This study used nationwide insurance data, accumulated possible features discussed in existing literature, and used a deep neural network to predict the patients choices of hospital levels.
The results showed that the model was able to predict with high area under the receiver operating characteristics curve (AUC) (0.90), accuracy (0.90), sensitivity (0.94), and specificity (0.97) with highly imbalanced label.
arXiv Detail & Related papers (2020-06-24T02:15:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.