Cost-Aware Prediction (CAP): An LLM-Enhanced Machine Learning Pipeline and Decision Support System for Heart Failure Mortality Prediction
- URL: http://arxiv.org/abs/2511.15357v1
- Date: Wed, 19 Nov 2025 11:34:47 GMT
- Title: Cost-Aware Prediction (CAP): An LLM-Enhanced Machine Learning Pipeline and Decision Support System for Heart Failure Mortality Prediction
- Authors: Yinan Yu, Falk Dippel, Christina E. Lundberg, Martin Lindgren, Annika Rosengren, Martin Adiels, Helen Sjöland,
- Abstract summary: This paper introduces a cost-aware prediction (CAP) framework that combines cost-benefit analysis assisted by large language model (LLM) agents.<n>We developed an ML model predicting 1-year mortality in patients with heart failure to identify those eligible for home care.
- Score: 2.6045133848979214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Objective: Machine learning (ML) predictive models are often developed without considering downstream value trade-offs and clinical interpretability. This paper introduces a cost-aware prediction (CAP) framework that combines cost-benefit analysis assisted by large language model (LLM) agents to communicate the trade-offs involved in applying ML predictions. Materials and Methods: We developed an ML model predicting 1-year mortality in patients with heart failure (N = 30,021, 22% mortality) to identify those eligible for home care. We then introduced clinical impact projection (CIP) curves to visualize important cost dimensions - quality of life and healthcare provider expenses, further divided into treatment and error costs, to assess the clinical consequences of predictions. Finally, we used four LLM agents to generate patient-specific descriptions. The system was evaluated by clinicians for its decision support value. Results: The eXtreme gradient boosting (XGB) model achieved the best performance, with an area under the receiver operating characteristic curve (AUROC) of 0.804 (95% confidence interval (CI) 0.792-0.816), area under the precision-recall curve (AUPRC) of 0.529 (95% CI 0.502-0.558) and a Brier score of 0.135 (95% CI 0.130-0.140). Discussion: The CIP cost curves provided a population-level overview of cost composition across decision thresholds, whereas LLM-generated cost-benefit analysis at individual patient-levels. The system was well received according to the evaluation by clinicians. However, feedback emphasizes the need to strengthen the technical accuracy for speculative tasks. Conclusion: CAP utilizes LLM agents to integrate ML classifier outcomes and cost-benefit analysis for more transparent and interpretable decision support.
Related papers
- Explainable Admission-Level Predictive Modeling for Prolonged Hospital Stay in Elderly Populations: Challenges in Low- and Middle-Income Countries [65.4286079244589]
Prolonged length of stay (pLoS) is a significant factor associated with the risk of adverse in-hospital events.<n>We develop and explain a predictive model for pLos using admission-level patient and hospital administrative data.
arXiv Detail & Related papers (2026-01-07T23:35:24Z) - Case Prompting to Mitigate Large Language Model Bias for ICU Mortality Prediction [17.91443453604627]
Large language models (LLMs) show promise in predicting outcomes from structured medical data.<n>LLMs may exhibit demographic biases related to sex, age, and race, limiting their trustworthy use in clinical practice.<n>We propose a training-free, clinically adaptive prompting framework to simultaneously improve fairness and performance.
arXiv Detail & Related papers (2025-12-17T12:29:53Z) - COPE: Chain-Of-Thought Prediction Engine for Open-Source Large Language Model Based Stroke Outcome Prediction from Clinical Notes [23.044580867637105]
Chain-of-Thought (CoT) Outcome Prediction Engine (COPE) is a reasoning-enhanced large language model framework for predicting outcomes from unstructured clinical notes.<n>This study included 464 acute ischemic stroke (AIS) patients with discharge summaries and 90-day modified Rankin Scale (mRS) scores.<n>COPE achieved an MAE of 1.01 ( 95% CI 0.92-1.11), +/-1 accuracy of 74.4% (69.9, 78.8%), and exact accuracy of 32.8% (28.0, 37.6%).
arXiv Detail & Related papers (2025-12-02T07:44:20Z) - CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation [18.396334867873307]
National Comprehensive Cancer Network (NCCN) provides evidence-based guidelines for cancer treatment.<n>Translating complex patient presentations into guideline-compliant treatment recommendations is time-intensive, requires specialized expertise, and is prone to error.<n>We present an agent-based approach to automatically generate guideline-concordant treatment trajectories for patients with non-small cell lung cancer.
arXiv Detail & Related papers (2025-09-09T01:49:29Z) - Early Mortality Prediction in ICU Patients with Hypertensive Kidney Disease Using Interpretable Machine Learning [3.4335475695580127]
Hypertensive kidney disease (HKD) patients in intensive care units (ICUs) face high short-term mortality.<n>We developed a machine learning framework to predict 30-day in-hospital mortality among ICU patients with HKD.
arXiv Detail & Related papers (2025-07-25T00:48:23Z) - AUTOCT: Automating Interpretable Clinical Trial Prediction with LLM Agents [47.640779069547534]
AutoCT is a novel framework that combines the reasoning capabilities of large language models with the explainability of classical machine learning.<n>We show that AutoCT performs on par with or better than SOTA methods on clinical trial prediction tasks within only a limited number of self-refinement iterations.
arXiv Detail & Related papers (2025-06-04T11:50:55Z) - Predicting Length of Stay in Neurological ICU Patients Using Classical Machine Learning and Neural Network Models: A Benchmark Study on MIMIC-IV [49.1574468325115]
This study explores multiple ML approaches for predicting LOS in ICU specifically for the patients with neurological diseases based on the MIMIC-IV dataset.<n>The evaluated models include classic ML algorithms (K-Nearest Neighbors, Random Forest, XGBoost and CatBoost) and Neural Networks (LSTM, BERT and Temporal Fusion Transformer)
arXiv Detail & Related papers (2025-05-23T14:06:42Z) - Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases [48.87360916431396]
We introduce MedR-Bench, a benchmarking dataset of 1,453 structured patient cases, annotated with reasoning references.<n>We propose a framework encompassing three critical examination recommendation, diagnostic decision-making, and treatment planning, simulating the entire patient care journey.<n>Using this benchmark, we evaluate five state-of-the-art reasoning LLMs, including DeepSeek-R1, OpenAI-o3-mini, and Gemini-2.0-Flash Thinking, etc.
arXiv Detail & Related papers (2025-03-06T18:35:39Z) - Explainable AI for Mental Health Emergency Returns: Integrating LLMs with Predictive Modeling [2.466324275447403]
Emergency department (ED) returns for mental health conditions pose a major healthcare burden, with 24-27% of patients returning within 30 days.<n>To assess whether integrating large language models (LLMs) with machine learning improves predictive accuracy and clinical interpretability of ED mental health return risk models.
arXiv Detail & Related papers (2025-01-21T15:41:20Z) - Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval [61.70489848327436]
KARE is a novel framework that integrates knowledge graph (KG) community-level retrieval with large language models (LLMs) reasoning.<n>Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions.
arXiv Detail & Related papers (2024-10-06T18:46:28Z) - Improving Clinical Decision Support through Interpretable Machine Learning and Error Handling in Electronic Health Records [6.594072648536156]
Trust-MAPS translates clinical domain knowledge into high-dimensional, mixed-integer programming models.<n>Trust-scores emerge as clinically meaningful features that not only boost predictive performance for clinical decision support tasks, but also lend interpretability to ML models.
arXiv Detail & Related papers (2023-08-21T15:14:49Z) - Optimal discharge of patients from intensive care via a data-driven
policy learning framework [58.720142291102135]
It is important that the patient discharge task addresses the nuanced trade-off between decreasing a patient's length of stay and the risk of readmission or even death following the discharge decision.
This work introduces an end-to-end general framework for capturing this trade-off to recommend optimal discharge timing decisions.
A data-driven approach is used to derive a parsimonious, discrete state space representation that captures a patient's physiological condition.
arXiv Detail & Related papers (2021-12-17T04:39:33Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.