The Foundational Capabilities of Large Language Models in Predicting Postoperative Risks Using Clinical Notes
- URL: http://arxiv.org/abs/2402.17493v5
- Date: Sat, 31 Aug 2024 19:42:31 GMT
- Title: The Foundational Capabilities of Large Language Models in Predicting Postoperative Risks Using Clinical Notes
- Authors: Charles Alba, Bing Xue, Joanna Abraham, Thomas Kannampallil, Chenyang Lu,
- Abstract summary: We examine the performance of large language models (LLMs) in predicting six postoperative risks using various fine-tuning strategies.
Pretrained LLMs outperformed traditional word embeddings by an absolute AUROC of 38.3% and AUPRC of 33.2%.
The highest performance was achieved with a unified foundation model, with improvements of 3.6% for AUROC and 2.6% for AUPRC compared to self-supervision.
- Score: 7.42249589630227
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clinical notes recorded during a patient's perioperative journey holds immense informational value. Advances in large language models (LLMs) offer opportunities for bridging this gap. Using 84,875 pre-operative notes and its associated surgical cases from 2018 to 2021, we examine the performance of LLMs in predicting six postoperative risks using various fine-tuning strategies. Pretrained LLMs outperformed traditional word embeddings by an absolute AUROC of 38.3% and AUPRC of 33.2%. Self-supervised fine-tuning further improved performance by 3.2% and 1.5%. Incorporating labels into training further increased AUROC by 1.8% and AUPRC by 2%. The highest performance was achieved with a unified foundation model, with improvements of 3.6% for AUROC and 2.6% for AUPRC compared to self-supervision, highlighting the foundational capabilities of LLMs in predicting postoperative risks, which could be potentially beneficial when deployed for perioperative care
Related papers
- A Novel Evaluation Benchmark for Medical LLMs: Illuminating Safety and Effectiveness in Clinical Domains [15.73821689524201]
Large language models (LLMs) hold promise in clinical decision support but face major challenges in safety evaluation and effectiveness validation.<n>We developed the Clinical Safety-Effectiveness Dual-Track Benchmark (CSEDB), a multidimensional framework built on clinical expert consensus.<n>Thirty-two specialist physicians developed and reviewed 2,069 open-ended Q&A items aligned with these criteria, spanning 26 clinical departments to simulate real-world scenarios.
arXiv Detail & Related papers (2025-07-31T12:10:00Z) - MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks [47.486705282473984]
Large language models (LLMs) achieve near-perfect scores on medical exams.<n>These evaluations inadequately reflect complexity and diversity of real-world clinical practice.<n>We introduce MedHELM, an evaluation framework for assessing LLM performance for medical tasks.
arXiv Detail & Related papers (2025-05-26T22:55:49Z) - Predicting Length of Stay in Neurological ICU Patients Using Classical Machine Learning and Neural Network Models: A Benchmark Study on MIMIC-IV [49.1574468325115]
This study explores multiple ML approaches for predicting LOS in ICU specifically for the patients with neurological diseases based on the MIMIC-IV dataset.<n>The evaluated models include classic ML algorithms (K-Nearest Neighbors, Random Forest, XGBoost and CatBoost) and Neural Networks (LSTM, BERT and Temporal Fusion Transformer)
arXiv Detail & Related papers (2025-05-23T14:06:42Z) - ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification [57.22053411719822]
ChestX-Reasoner is a radiology diagnosis MLLM designed to leverage process supervision mined directly from clinical reports.
Our two-stage training framework combines supervised fine-tuning and reinforcement learning guided by process rewards to better align model reasoning with clinical standards.
arXiv Detail & Related papers (2025-04-29T16:48:23Z) - The Role of Machine Learning in Reducing Healthcare Costs: The Impact of Medication Adherence and Preventive Care on Hospitalization Expenses [18.97832426593808]
This study reveals the important role of prevention care and medication adherence in reducing hospitalizations.
Four machine learning models Logistic Regression, Gradient Boosting, Random Forest, and Artificial Neural Networks are applied to predict five-year hospitalization risk.
Patients with high medication adherence and consistent preventive care can reduce 38.3% and 37.7% in hospitalization risk.
arXiv Detail & Related papers (2025-04-10T03:28:42Z) - Primary Care Diagnoses as a Reliable Predictor for Orthopedic Surgical Interventions [0.10624941710159722]
Referral workflow inefficiencies contribute to suboptimal patient outcomes and higher healthcare costs.
In this study, we investigated the possibility of predicting procedural needs based on primary care diagnostic entries.
arXiv Detail & Related papers (2025-02-06T17:15:12Z) - Leveraging Large Language Models to Enhance Machine Learning Interpretability and Predictive Performance: A Case Study on Emergency Department Returns for Mental Health Patients [2.3769374446083735]
Emergency department (ED) returns for mental health conditions pose a major healthcare burden, with 24-27% of patients returning within 30 days.
To assess whether integrating large language models (LLMs) with machine learning improves predictive accuracy and clinical interpretability of ED mental health return risk models.
arXiv Detail & Related papers (2025-01-21T15:41:20Z) - Process-Supervised Reward Models for Verifying Clinical Note Generation: A Scalable Approach Guided by Domain Expertise [19.71388941192149]
We train a PRM to provide step-level reward signals for clinical notes generated by large language models (LLM)
Our proposed PRM, trained on the LLaMA-3.1 8B instruct model, outperformed both Gemini-Pro 1.5 and the vanilla outcome-supervised reward model (ORM) in two key evaluations.
arXiv Detail & Related papers (2024-12-17T06:24:34Z) - A Novel Generative Multi-Task Representation Learning Approach for Predicting Postoperative Complications in Cardiac Surgery Patients [7.42249589630227]
Machine learning can be leveraged to identify and predict patient risks for postoperative complications.
We developed and validated the effectiveness of predicting postoperative complications using a novel surgical Variational Autoencoder.
surgVAE uncovers intrinsic patterns via cross-task and cross-cohort presentation learning.
arXiv Detail & Related papers (2024-12-02T20:24:02Z) - DeLLiriuM: A large language model for delirium prediction in the ICU using structured EHR [1.4699314771635081]
Delirium is an acute confusional state that has been shown to affect up to 31% of patients in the intensive care unit (ICU)
We develop and validate DeLLiriuM on ICU admissions from 104,303 patients pertaining to 195 hospitals across three large databases.
arXiv Detail & Related papers (2024-10-22T18:56:31Z) - Closing the gap between open-source and commercial large language models for medical evidence summarization [20.60798771155072]
Large language models (LLMs) hold great promise in summarizing medical evidence.
Most recent studies focus on the application of proprietary LLMs.
While open-source LLMs allow better transparency and customization, their performance falls short compared to proprietary ones.
arXiv Detail & Related papers (2024-07-25T05:03:01Z) - Automatic Prediction of Amyotrophic Lateral Sclerosis Progression using Longitudinal Speech Transformer [56.17737749551133]
We propose ALS longitudinal speech transformer (ALST), a neural network-based automatic predictor of ALS disease progression.
By taking advantage of high-quality pretrained speech features and longitudinal information in the recordings, our best model achieves 91.0% AUC.
ALST is capable of fine-grained and interpretable predictions of ALS progression, especially for distinguishing between rarer and more severe cases.
arXiv Detail & Related papers (2024-06-26T13:28:24Z) - Efficient Continual Pre-training by Mitigating the Stability Gap [68.49269649759005]
We study the behavior of Large Language Models (LLMs) during continual pre-training.
We propose three effective strategies to enhance LLM performance within a fixed compute budget.
Our strategies improve the average medical task performance of the OpenLlama-3B model from 36.2% to 40.7% with only 40% of the original training budget.
arXiv Detail & Related papers (2024-06-21T02:28:37Z) - Modified Risk Formulation for Improving the Prediction of Knee Osteoarthritis Progression [36.12790384412525]
Current methods for predicting osteoarthritis (OA) outcomes do not incorporate disease specific prior knowledge.
We developed a novel approach that effectively uses consecutive imaging studies to improve OA outcome predictions.
arXiv Detail & Related papers (2024-06-14T15:24:49Z) - RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness [102.06442250444618]
We introduce RLAIF-V, a novel framework that aligns MLLMs in a fully open-source paradigm.
RLAIF-V maximally explores open-source MLLMs from two perspectives, including high-quality feedback data generation.
Experiments on six benchmarks in both automatic and human evaluation show that RLAIF-V substantially enhances the trustworthiness of models.
arXiv Detail & Related papers (2024-05-27T14:37:01Z) - Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models [56.00992369295851]
Open-sourced Large Language Models (LLMs) have achieved great success in various NLP tasks, however, they are still far inferior to API-based models when acting as agents.
This paper delivers three key observations: (1) the current agent training corpus is entangled with both formats following and agent reasoning, which significantly shifts from the distribution of its pre-training data; (2) LLMs exhibit different learning speeds on the capabilities required by agent tasks; and (3) current approaches have side-effects when improving agent abilities by introducing hallucinations.
We propose Agent-FLAN to effectively Fine-tune LANguage models for Agents.
arXiv Detail & Related papers (2024-03-19T16:26:10Z) - Sequential Model for Predicting Patient Adherence in Subcutaneous Immunotherapy for Allergic Rhinitis [16.386676205583697]
Subcutaneous Immunotherapy (SCIT) is the long-lasting causal treatment of allergic rhinitis (AR)
This study aims to leverage novel machine learning models to precisely predict the risk of non-adherence of AR patients.
arXiv Detail & Related papers (2024-01-21T09:55:47Z) - Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs [56.526095828316386]
We propose a novel framework for adaptation with self-evaluation to improve the selective prediction performance of large language models (LLMs)
We evaluate our method on a variety of question-answering (QA) datasets and show that it outperforms state-of-the-art selective prediction methods.
arXiv Detail & Related papers (2023-10-18T03:34:59Z) - Democratizing LLMs: An Exploration of Cost-Performance Trade-offs in
Self-Refined Open-Source Models [53.859446823312126]
SoTA open source models of varying sizes from 7B - 65B, on average, improve 8.2% from their baseline performance.
Strikingly, even models with extremely small memory footprints, such as Vicuna-7B, show a 11.74% improvement overall and up to a 25.39% improvement in high-creativity, open ended tasks.
arXiv Detail & Related papers (2023-10-11T15:56:00Z) - Federated Learning Enables Big Data for Rare Cancer Boundary Detection [98.5549882883963]
We present findings from the largest Federated ML study to-date, involving data from 71 healthcare institutions across 6 continents.
We generate an automatic tumor boundary detector for the rare disease of glioblastoma.
We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent.
arXiv Detail & Related papers (2022-04-22T17:27:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.