Brittleness and Promise: Knowledge Graph Based Reward Modeling for Diagnostic Reasoning
- URL: http://arxiv.org/abs/2509.18316v1
- Date: Mon, 22 Sep 2025 18:39:09 GMT
- Title: Brittleness and Promise: Knowledge Graph Based Reward Modeling for Diagnostic Reasoning
- Authors: Saksham Khatwani, He Cheng, Majid Afshar, Dmitriy Dligach, Yanjun Gao,
- Abstract summary: Large language models (LLMs) show promise for diagnostic reasoning but often lack reliable, knowledge grounded inference.<n>We treat the LLM as a reward model of KG reasoning paths, where the model learns to judge whether a candidate path leads to correct diagnosis for a given patient input.<n>Our finding provides the first systematic assessment of "reward model style" reasoning over clinical KGs.
- Score: 8.35131510062609
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language models (LLMs) show promise for diagnostic reasoning but often lack reliable, knowledge grounded inference. Knowledge graphs (KGs), such as the Unified Medical Language System (UMLS), offer structured biomedical knowledge that can support trustworthy reasoning. Prior approaches typically integrate KGs via retrieval augmented generation or fine tuning, inserting KG content into prompts rather than enabling structured reasoning. We explore an alternative paradigm: treating the LLM as a reward model of KG reasoning paths, where the model learns to judge whether a candidate path leads to correct diagnosis for a given patient input. This approach is inspired by recent work that leverages reward training to enhance model reasoning abilities, and grounded in computational theory, which suggests that verifying a solution is often easier than generating one from scratch. It also parallels physicians' diagnostic assessment, where they judge which sequences of findings and intermediate conditions most plausibly support a diagnosis. We first systematically evaluate five task formulation for knowledge path judging and eight training paradigm. Second, we test whether the path judging abilities generalize to downstream diagnostic tasks, including diagnosis summarization and medical question answering. Experiments with three open source instruct-tuned LLMs reveal both promise and brittleness: while specific reward optimization and distillation lead to strong path-judging performance, the transferability to downstream tasks remain weak. Our finding provides the first systematic assessment of "reward model style" reasoning over clinical KGs, offering insights into how structured, reward-based supervision influences diagnostic reasoning in GenAI systems for healthcare.
Related papers
- PathReasoner-R1: Instilling Structured Reasoning into Pathology Vision-Language Model via Knowledge-Guided Policy Optimization [6.821738567680833]
We construct PathReasoner, the first large-scale dataset of whole-slide image (WSI) reasoning.<n>PathReasoner-R1 synergizes supervised fine-tuning with reasoning-oriented reinforcement learning to instill structured chain-of-thought capabilities.<n>Experiments demonstrate that PathReasoner-R1 achieves state-of-the-art performance on both PathReasoner and public benchmarks across various image scales.
arXiv Detail & Related papers (2026-01-29T12:21:16Z) - Reasoning Visual Language Model for Chest X-Ray Analysis [30.318629424154206]
We present a framework that brings chain-of-thought (CoT) reasoning to chest X-ray interpretation.<n>Inspired by reasoning-first training paradigms, our approach is designed to learn how experts reason, not just what they conclude.<n>We release code and the model NV-Reason-CXR-3B to support community progress toward trustworthy, explainable AI in chest radiography.
arXiv Detail & Related papers (2025-10-28T00:48:00Z) - RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis [56.373297358647655]
Retrieval-Augmented Diagnosis (RAD) is a novel framework that injects external knowledge into multimodal models directly on downstream tasks.<n>RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss transformer, and a dual decoder.
arXiv Detail & Related papers (2025-09-24T10:36:14Z) - End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning [52.12425911708585]
Deep-DxSearch is an agentic RAG system trained end-to-end with reinforcement learning (RL)<n>In Deep-DxSearch, we first construct a large-scale medical retrieval corpus comprising patient records and reliable medical knowledge sources.<n> Experiments demonstrate that our end-to-end RL training framework consistently outperforms prompt-engineering and training-free RAG approaches.
arXiv Detail & Related papers (2025-08-21T17:42:47Z) - Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications [59.721265428780946]
Large Language Models (LLMs) in medicine have enabled impressive capabilities, yet a critical gap remains in their ability to perform systematic, transparent, and verifiable reasoning.<n>This paper provides the first systematic review of this emerging field.<n>We propose a taxonomy of reasoning enhancement techniques, categorized into training-time strategies and test-time mechanisms.
arXiv Detail & Related papers (2025-08-01T14:41:31Z) - Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models [52.2001050216955]
Existing methods aim to enhance the performance of Medical Vision Language Model (MedVLM) by adjusting model structure, fine-tuning with high-quality data, or through preference fine-tuning.<n>We propose an expert-in-the-loop framework named Expert-Controlled-Free Guidance (Expert-CFG) to align MedVLM with clinical expertise without additional training.
arXiv Detail & Related papers (2025-07-12T09:03:30Z) - KERAP: A Knowledge-Enhanced Reasoning Approach for Accurate Zero-shot Diagnosis Prediction Using Multi-agent LLMs [39.47350988195002]
Large language models (LLMs) have shown promise in leveraging language abilities and biomedical knowledge for diagnosis prediction.<n>We propose KERAP, a knowledge graph (KG)-enhanced reasoning approach that improves LLM-based diagnosis prediction through a multi-agent architecture.<n>Our framework consists of a linkage agent for mapping, a retrieval agent for structured knowledge extraction, and a prediction agent that iteratively refines diagnosis predictions.
arXiv Detail & Related papers (2025-07-03T16:35:11Z) - Bridging Stepwise Lab-Informed Pretraining and Knowledge-Guided Learning for Diagnostic Reasoning [20.369746122143063]
We propose a dual-expertise framework that combines two complementary sources of information.<n>For external knowledge, we construct a Diagnosis Knowledge Graph (KG) that encodes both hierarchical language and semantic relations enriched by large models.<n>We introduce a lab-informed proxy task that guides the model to follow a clinically consistent stepwise reasoning process based on lab test signals.
arXiv Detail & Related papers (2024-10-25T20:25:22Z) - Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding [72.18719355481052]
We introduce a novel task called Medical Report Grounding (MRG)<n>MRG aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner.<n>We propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases.
arXiv Detail & Related papers (2024-04-10T07:41:35Z) - Towards the Identifiability and Explainability for Personalized Learner
Modeling: An Inductive Paradigm [36.60917255464867]
We propose an identifiable cognitive diagnosis framework (ID-CDF) based on a novel response-proficiency-response paradigm inspired by encoder-decoder models.
We show that ID-CDF can effectively address the problems without loss of diagnosis preciseness.
arXiv Detail & Related papers (2023-09-01T07:18:02Z) - Leveraging Medical Knowledge Graphs Into Large Language Models for Diagnosis Prediction: Design and Application Study [6.10474409373543]
We propose an innovative approach for augmenting the proficiency of Large Language Models (LLMs) in automated diagnosis generation.<n>We derive the KG from the National Library of Medicine's Unified Medical Language System (UMLS), a robust repository of biomedical knowledge.<n>Our approach offers an explainable diagnostic pathway, edging us closer to the realization of AI-augmented diagnostic decision support systems.
arXiv Detail & Related papers (2023-08-28T06:05:18Z) - NeuralSympCheck: A Symptom Checking and Disease Diagnostic Neural Model
with Logic Regularization [59.15047491202254]
symptom checking systems inquire users for their symptoms and perform a rapid and affordable medical assessment of their condition.
We propose a new approach based on the supervised learning of neural models with logic regularization.
Our experiments show that the proposed approach outperforms the best existing methods in the accuracy of diagnosis when the number of diagnoses and symptoms is large.
arXiv Detail & Related papers (2022-06-02T07:57:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.