Related papers: Knowledge Graph Augmented Large Language Models for Disease Prediction

Knowledge Graph Augmented Large Language Models for Disease Prediction

URL: http://arxiv.org/abs/2512.01210v2
Date: Tue, 02 Dec 2025 21:43:54 GMT
Title: Knowledge Graph Augmented Large Language Models for Disease Prediction
Authors: Ruiyu Wang, Tuan Vinh, Ran Xu, Yuyin Zhou, Jiaying Lu, Carl Yang, Francisco Pasquel,
Abstract summary: Knowledge graph (KG)-guided chain-of-thought (CoT) framework generates clinically grounded reasoning for visit-level disease prediction in MIMIC-III.<n> ICD-9 codes are mapped to PrimeKG, from which disease-relevant nodes and multi-hop reasoning paths are extracted and used as scaffolds for CoT generation.<n> KG-guided models outperform strong classical baselines, achieving AUROC values of 0.66 to 0.70 and macro-AUPR values of 0.40 to 0.47.<n>A blinded clinician evaluation shows consistent preference for KG-guided CoT explanations in clarity, relevance, and clinical correctness.
Score: 24.992170033802537
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Electronic health records (EHRs) support powerful clinical prediction models, but existing methods typically provide coarse, post hoc explanations that offer limited value for patient-level decision making. We introduce a knowledge graph (KG)-guided chain-of-thought (CoT) framework that generates clinically grounded and temporally consistent reasoning for visit-level disease prediction in MIMIC-III. ICD-9 codes are mapped to PrimeKG, from which disease-relevant nodes and multi-hop reasoning paths are extracted and used as scaffolds for CoT generation; only explanations whose conclusions match observed outcomes are retained. Lightweight LLaMA-3.1-Instruct-8B and Gemma-7B models are then fine-tuned on this supervision corpus. Across ten PrimeKG-mapped diseases and limited training cohorts (400 and 1000 cases), KG-guided models outperform strong classical baselines, achieving AUROC values of 0.66 to 0.70 and macro-AUPR values of 0.40 to 0.47. The models also transfer zero-shot to the CRADLE cohort, improving accuracy from approximately 0.40 to 0.51 up to 0.72 to 0.77. A blinded clinician evaluation shows consistent preference for KG-guided CoT explanations in clarity, relevance, and clinical correctness.

Related papers

Suppressing Prior-Comparison Hallucinations in Radiology Report Generation via Semantically Decoupled Latent Steering [94.37535002230504]
We develop a training-free, inference-time control framework termed Semantically Decoupled Latent Steering.<n>Our approach constructs a semantic-free intervention vector via large language model (LLM)-driven semantic decomposition.<n>We show that our approach significantly reduces the probability of historical hallucinations.
arXiv Detail & Related papers (2026-02-27T04:49:01Z)
TwinWeaver: An LLM-Based Foundation Model Framework for Pan-Cancer Digital Twins [33.30007167473537]
We build Genie Digital Twin (GDT) on 93,054 patients across cancer types.<n>GDT significantly reduces forecasting error, achieving a median Mean Absolute Scaled Error (MASE) of 0.87.<n>GDT generalizes to out-of-distribution matching clinical trials, trained baselines at zero-shot and surpassed them with fine-tuning.
arXiv Detail & Related papers (2026-01-28T15:40:54Z)
Explainable Admission-Level Predictive Modeling for Prolonged Hospital Stay in Elderly Populations: Challenges in Low- and Middle-Income Countries [65.4286079244589]
Prolonged length of stay (pLoS) is a significant factor associated with the risk of adverse in-hospital events.<n>We develop and explain a predictive model for pLos using admission-level patient and hospital administrative data.
arXiv Detail & Related papers (2026-01-07T23:35:24Z)
COPE: Chain-Of-Thought Prediction Engine for Open-Source Large Language Model Based Stroke Outcome Prediction from Clinical Notes [23.044580867637105]
Chain-of-Thought (CoT) Outcome Prediction Engine (COPE) is a reasoning-enhanced large language model framework for predicting outcomes from unstructured clinical notes.<n>This study included 464 acute ischemic stroke (AIS) patients with discharge summaries and 90-day modified Rankin Scale (mRS) scores.<n>COPE achieved an MAE of 1.01 ( 95% CI 0.92-1.11), +/-1 accuracy of 74.4% (69.9, 78.8%), and exact accuracy of 32.8% (28.0, 37.6%).
arXiv Detail & Related papers (2025-12-02T07:44:20Z)
Chronic Kidney Disease Prognosis Prediction Using Transformer [2.054117570146147]
Chronic Kidney Disease (CKD) affects nearly 10% of the global population and often progresses to end-stage renal failure.<n>We present a transformer-based framework for predicting CKD progression using multi-modal electronic health records.
arXiv Detail & Related papers (2025-11-04T07:52:17Z)
Teaching AI Stepwise Diagnostic Reasoning with Report-Guided Chain-of-Thought Learning [11.537036709742345]
DiagCoT is a framework that applies supervised fine-tuning to general-purpose vision-language models (VLMs)<n>DiagCoT combines contrastive image-report tuning for domain alignment, chain-of-thought supervision to capture inferential logic, and reinforcement tuning with clinical reward signals to enhance factual accuracy and fluency.<n>It outperformed state-of-the-art models including LLaVA-Med and CXR-LLAVA on long-tailed diseases and external datasets.
arXiv Detail & Related papers (2025-09-08T08:01:26Z)
A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer [54.58205672910646]
RenalCLIP is a visual-language foundation model for characterization, diagnosis and prognosis of renal mass.<n>It achieved better performance and superior generalizability across 10 core tasks spanning the full clinical workflow of kidney cancer.
arXiv Detail & Related papers (2025-08-22T17:48:19Z)
A Novel Attention-Augmented Wavelet YOLO System for Real-time Brain Vessel Segmentation on Transcranial Color-coded Doppler [49.03919553747297]
We propose an AI-powered, real-time CoW auto-segmentation system capable of efficiently capturing cerebral arteries.<n>No prior studies have explored AI-driven cerebrovascular segmentation using Transcranial Color-coded Doppler (TCCD)<n>The proposed AAW-YOLO demonstrated strong performance in segmenting both ipsilateral and contralateral CoW vessels.
arXiv Detail & Related papers (2025-08-19T14:41:22Z)
Predicting Length of Stay in Neurological ICU Patients Using Classical Machine Learning and Neural Network Models: A Benchmark Study on MIMIC-IV [49.1574468325115]
This study explores multiple ML approaches for predicting LOS in ICU specifically for the patients with neurological diseases based on the MIMIC-IV dataset.<n>The evaluated models include classic ML algorithms (K-Nearest Neighbors, Random Forest, XGBoost and CatBoost) and Neural Networks (LSTM, BERT and Temporal Fusion Transformer)
arXiv Detail & Related papers (2025-05-23T14:06:42Z)
CRTRE: Causal Rule Generation with Target Trial Emulation Framework [47.2836994469923]
We introduce a novel method called causal rule generation with target trial emulation framework (CRTRE) CRTRE applies randomize trial design principles to estimate the causal effect of association rules. We then incorporate such association rules for the downstream applications such as prediction of disease onsets.
arXiv Detail & Related papers (2024-11-10T02:40:06Z)
Learning to diagnose cirrhosis from radiological and histological labels with joint self and weakly-supervised pretraining strategies [62.840338941861134]
We propose to leverage transfer learning from large datasets annotated by radiologists, to predict the histological score available on a small annex dataset. We compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis. This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75.
arXiv Detail & Related papers (2023-02-16T17:06:23Z)
Contrastive learning-based pretraining improves representation and transferability of diabetic retinopathy classification models [1.9882302955470608]
Self supervised contrastive learning based pretraining allows development of robust and generalized deep learning models with small, labeled datasets. This paper aims to evaluate the effect of CL based pretraining on the performance of referrable vs non referrable diabetic retinopathy (DR) classification.
arXiv Detail & Related papers (2022-08-24T14:07:45Z)
An Interpretable Web-based Glioblastoma Multiforme Prognosis Prediction Tool using Random Forest Model [1.1024591739346292]
We propose predictive models that estimate GBM patients' health status of one-year after treatments. We used total of 467 GBM patients' clinical profile consists of 13 features and two follow-up dates. Our machine learning models suggest that the top three prognostic factors for GBM patient survival were MGMT gene promoter, the extent of resection, and age.
arXiv Detail & Related papers (2021-08-30T07:56:34Z)
Ensemble model for pre-discharge icd10 coding prediction [45.82374977939355]
We propose an ensemble model incorporating multiple clinical data sources for accurate code predictions. We obtain multi-label classification accuracies of 0.73 and 0.58 for average precision, 0.56 and 0.35 for F1-scores and 0.71 and 0.4 accuracy in predicting principal diagnosis for inpatient and outpatient datasets respectively.
arXiv Detail & Related papers (2020-12-16T07:02:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.