Related papers: Knowledge Elicitation with Large Language Models for Interpretable Cancer Stage Identification from Pathology Reports

Knowledge Elicitation with Large Language Models for Interpretable Cancer Stage Identification from Pathology Reports

URL: http://arxiv.org/abs/2511.01052v1
Date: Sun, 02 Nov 2025 19:00:40 GMT
Title: Knowledge Elicitation with Large Language Models for Interpretable Cancer Stage Identification from Pathology Reports
Authors: Yeawon Lee, Christopher C. Yang, Chia-Hsuan Chang, Grace Lu-Yao,
Abstract summary: We introduce two Knowledge Elicitation methods designed to overcome limitations by enabling large language models to induce and apply domain-specific rules for cancer staging.<n>The first, Knowledge Elicitation with Long-Term Memory (KEwLTM), uses an iterative prompting strategy to derive staging rules directly from unannotated pathology reports.<n>The second, Knowledge Elicitation with Retrieval-Augmented Generation (KEwRAG), employs a variation of RAG where rules are pre-extracted from relevant guidelines in a single step and then applied, enhancing interpretability and avoiding repeated retrieval overhead.
Score: 2.5829043503611318
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cancer staging is critical for patient prognosis and treatment planning, yet extracting pathologic TNM staging from unstructured pathology reports poses a persistent challenge. Existing natural language processing (NLP) and machine learning (ML) strategies often depend on large annotated datasets, limiting their scalability and adaptability. In this study, we introduce two Knowledge Elicitation methods designed to overcome these limitations by enabling large language models (LLMs) to induce and apply domain-specific rules for cancer staging. The first, Knowledge Elicitation with Long-Term Memory (KEwLTM), uses an iterative prompting strategy to derive staging rules directly from unannotated pathology reports, without requiring ground-truth labels. The second, Knowledge Elicitation with Retrieval-Augmented Generation (KEwRAG), employs a variation of RAG where rules are pre-extracted from relevant guidelines in a single step and then applied, enhancing interpretability and avoiding repeated retrieval overhead. We leverage the ability of LLMs to apply broad knowledge learned during pre-training to new tasks. Using breast cancer pathology reports from the TCGA dataset, we evaluate their performance in identifying T and N stages, comparing them against various baseline approaches on two open-source LLMs. Our results indicate that KEwLTM outperforms KEwRAG when Zero-Shot Chain-of-Thought (ZSCOT) inference is effective, whereas KEwRAG achieves better performance when ZSCOT inference is less effective. Both methods offer transparent, interpretable interfaces by making the induced rules explicit. These findings highlight the promise of our Knowledge Elicitation methods as scalable, high-performing solutions for automated cancer staging with enhanced interpretability, particularly in clinical settings with limited annotated data.

Related papers

CANDLE: A Cross-Modal Agentic Knowledge Distillation Framework for Interpretable Sarcopenia Diagnosis [3.0245458192729466]
CANDLE mitigates the interpretability-performance trade-off, enhances predictive accuracy, and preserves high decision consistency.<n>The framework offers a scalable approach to knowledge assetization of TML models, enabling interpretable, reproducible, and clinically aligned decision support in sarcopenia and potentially broader medical domains.
arXiv Detail & Related papers (2025-07-26T15:50:08Z)
Language Bottleneck Models: A Framework for Interpretable Knowledge Tracing and Beyond [55.984684518346924]
We recast Knowledge Tracing as an inverse problem: learning the minimum natural-language summary that makes past answers explainable and future answers predictable.<n>Our Language Bottleneck Model (LBM) consists of an encoder LLM that writes an interpretable knowledge summary and a frozen decoder LLM that must reconstruct and predict student responses using only that summary text.<n> Experiments on synthetic arithmetic benchmarks and the large-scale Eedi dataset show that LBMs rival the accuracy of state-of-the-art KT and direct LLM methods while requiring orders-of-magnitude fewer student trajectories.
arXiv Detail & Related papers (2025-06-20T13:21:14Z)
Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering [75.12322966980003]
Large Language Models (LLMs) have shown strong inductive reasoning ability across various domains.<n>Most existing RAG pipelines rely on unstructured text, limiting interpretability and structured reasoning.<n>Recent studies have explored integrating knowledge graphs with LLMs for knowledge graph question answering.<n>We propose RAPL, a novel framework for efficient and effective graph retrieval in KGQA.
arXiv Detail & Related papers (2025-06-11T12:03:52Z)
Leaps Beyond the Seen: Reinforced Reasoning Augmented Generation for Clinical Notes [10.897880916802864]
ReinRAG is a reasoning augmented generation (RAG) for long-form discharge instructions based on pre-admission information.<n>To bridge the information gap, we propose group-based retriever optimization (GRO) which improves retrieval quality with group-normalized rewards.<n>Experiments on the real-world dataset show that ReinRAG outperforms baselines in both clinical efficacy and natural language generation metrics.
arXiv Detail & Related papers (2025-06-03T12:59:52Z)
Tripartite-GraphRAG via Plugin Ontologies [0.011161220996480647]
Large Language Models (LLMs) have shown remarkable capabilities across various domains, yet they struggle with knowledge-intensive tasks.<n>Key limitations include their tendency to hallucinate, lack of source traceability (provenance) and challenges in timely knowledge updates.<n>We propose a novel approach that combines LLMs with a tripartite knowledge graph representation.
arXiv Detail & Related papers (2025-04-28T10:43:35Z)
AGIR: Assessing 3D Gait Impairment with Reasoning based on LLMs [0.0]
gait impairment plays an important role in early diagnosis, disease monitoring, and treatment evaluation for neurodegenerative diseases.<n>Recent deep learning-based approaches have consistently improved classification accuracies, but they often lack interpretability.<n>We introduce AGIR, a novel pipeline consisting of a pre-trained VQ-VAE motion tokenizer and a Large Language Model (LLM) fine-tuned over pairs of motion tokens.
arXiv Detail & Related papers (2025-03-23T17:12:16Z)
Large Language Models-guided Dynamic Adaptation for Temporal Knowledge Graph Reasoning [87.10396098919013]
Large Language Models (LLMs) have demonstrated extensive knowledge and remarkable proficiency in temporal reasoning.<n>We propose a Large Language Models-guided Dynamic Adaptation (LLM-DA) method for reasoning on Temporal Knowledge Graphs.<n>LLM-DA harnesses the capabilities of LLMs to analyze historical data and extract temporal logical rules.
arXiv Detail & Related papers (2024-05-23T04:54:37Z)
XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare [16.79952669254101]
We introduce a knowledge-guided in-context learning framework to enable large language models to process structured clinical data.<n>Our approach integrates domain-specific feature groupings, carefully balanced few-shot examples, and task-specific prompting strategies.
arXiv Detail & Related papers (2024-05-10T06:52:44Z)
Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-based Retrofitting [51.7049140329611]
This paper proposes Knowledge Graph-based Retrofitting (KGR) to mitigate factual hallucination during the reasoning process. Experiments show that KGR can significantly improve the performance of LLMs on factual QA benchmarks.
arXiv Detail & Related papers (2023-11-22T11:08:38Z)
Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning. They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health. Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG) CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure. Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.