CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation
- URL: http://arxiv.org/abs/2509.07325v1
- Date: Tue, 09 Sep 2025 01:49:29 GMT
- Title: CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation
- Authors: Alyssa Unell, Noel C. F. Codella, Sam Preston, Peniel Argaw, Wen-wai Yim, Zelalem Gero, Cliff Wong, Rajesh Jena, Eric Horvitz, Amanda K. Hall, Ruican Rachel Zhong, Jiachen Li, Shrey Jain, Mu Wei, Matthew Lungren, Hoifung Poon,
- Abstract summary: National Comprehensive Cancer Network (NCCN) provides evidence-based guidelines for cancer treatment.<n>Translating complex patient presentations into guideline-compliant treatment recommendations is time-intensive, requires specialized expertise, and is prone to error.<n>We present an agent-based approach to automatically generate guideline-concordant treatment trajectories for patients with non-small cell lung cancer.
- Score: 18.396334867873307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The National Comprehensive Cancer Network (NCCN) provides evidence-based guidelines for cancer treatment. Translating complex patient presentations into guideline-compliant treatment recommendations is time-intensive, requires specialized expertise, and is prone to error. Advances in large language model (LLM) capabilities promise to reduce the time required to generate treatment recommendations and improve accuracy. We present an LLM agent-based approach to automatically generate guideline-concordant treatment trajectories for patients with non-small cell lung cancer (NSCLC). Our contributions are threefold. First, we construct a novel longitudinal dataset of 121 cases of NSCLC patients that includes clinical encounters, diagnostic results, and medical histories, each expertly annotated with the corresponding NCCN guideline trajectories by board-certified oncologists. Second, we demonstrate that existing LLMs possess domain-specific knowledge that enables high-quality proxy benchmark generation for both model development and evaluation, achieving strong correlation (Spearman coefficient r=0.88, RMSE = 0.08) with expert-annotated benchmarks. Third, we develop a hybrid approach combining expensive human annotations with model consistency information to create both the agent framework that predicts the relevant guidelines for a patient, as well as a meta-classifier that verifies prediction accuracy with calibrated confidence scores for treatment recommendations (AUROC=0.800), a critical capability for communicating the accuracy of outputs, custom-tailoring tradeoffs in performance, and supporting regulatory compliance. This work establishes a framework for clinically viable LLM-based guideline adherence systems that balance accuracy, interpretability, and regulatory requirements while reducing annotation costs, providing a scalable pathway toward automated clinical decision support.
Related papers
- Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification [60.18369393468405]
Existing verifiers usually underperform owing to a lack of domain knowledge and limited calibration.<n>GLEAN compiles expert-curated protocols into trajectory-informed, well-calibrated correctness signals.<n>We empirically validate GLEAN with agentic clinical diagnosis across three diseases from the MIMIC-IV dataset.
arXiv Detail & Related papers (2026-03-03T09:36:43Z) - A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine [59.78991974851707]
Large language models (LLMs) have demonstrated strong performance on medical benchmarks, including question answering and diagnosis.<n>Most medical LLMs are trained on data from a single institution, which faces limitations in generalizability and safety in heterogeneous systems.<n>We introduce the model-agnostic and parameter-efficient federated learning framework for adapting LLMs to medical applications.
arXiv Detail & Related papers (2026-01-29T18:48:21Z) - Attention-Based Offline Reinforcement Learning and Clustering for Interpretable Sepsis Treatment [0.7209528581296429]
A clustering-based stratification module categorizes patients into low, intermediate, and high-risk groups upon ICU admission.<n>A synthetic data augmentation pipeline leveraging variational autoencoders (VAE) and diffusion models enrich underrepresented trajectories such as fluid or vasopressor administration.<n>A rationale generation module powered by a multi-modal large language model produces natural-language justifications grounded in clinical context.
arXiv Detail & Related papers (2026-01-20T18:41:44Z) - Enhancing Lung Cancer Treatment Outcome Prediction through Semantic Feature Engineering Using Large Language Models [5.778370321351782]
We introduce a framework that uses Large Language Models (LLMs) as Goal-oriented Knowledge Curators (GKC)<n>GKC converts laboratory, genomic, and medication data into high-fidelity, task-aligned features.<n>We benchmarked GKC against expert-engineered features, direct text embeddings, and an end-to-end transformer.
arXiv Detail & Related papers (2025-12-01T23:56:45Z) - Prior-informed optimization of treatment recommendation via bandit algorithms trained on large language model-processed historical records [0.6875312133832079]
Current medical practice depends on standardized treatment frameworks and empirical methodologies that neglect individual patient variations.<n>We develop a comprehensive system integrating Large Language Models (LLMs), Conditional Tabular Generative Adversarial Networks (CTGAN), T-learner counterfactual models, and contextual bandit approaches.
arXiv Detail & Related papers (2025-10-21T18:57:00Z) - LGE-Guided Cross-Modality Contrastive Learning for Gadolinium-Free Cardiomyopathy Screening in Cine CMR [51.11296719862485]
We propose a Contrastive Learning and Cross-Modal alignment framework for gadolinium-free cardiomyopathy screening using cine CMR sequences.<n>By aligning the latent spaces of cine CMR and Late Gadolinium Enhancement (LGE) sequences, our model encodes fibrosis-specific pathology into cine CMR embeddings.
arXiv Detail & Related papers (2025-08-23T07:21:23Z) - CliCARE: Grounding Large Language Models in Clinical Guidelines for Decision Support over Longitudinal Cancer Electronic Health Records [30.100759175769454]
Large Language Models (LLMs) hold significant promise for improving clinical decision support and reducing physician burnout.<n>We propose CliCARE, a framework for Grounding Large Language Models in Clinical Guidelines for Decision Support over Longitudinal Cancer Electronic Health Records.<n>We validated our framework using large-scale, longitudinal data from a private Chinese cancer dataset and the public English MIMIC-IV dataset.
arXiv Detail & Related papers (2025-07-30T10:02:16Z) - LLM-Augmented Symptom Analysis for Cardiovascular Disease Risk Prediction: A Clinical NLP [2.2615384250361004]
This study introduces a novel LLM-augmented clinical NLP pipeline that employs domain-adapted large language models for symptom extraction, contextual reasoning, and correlation from free-text reports.<n> Evaluations on MIMIC-III and CARDIO-NLP datasets demonstrate improved performance in precision, recall, F1-score, and AUROC, with high clinical relevance.
arXiv Detail & Related papers (2025-07-15T07:32:16Z) - Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models [52.2001050216955]
Existing methods aim to enhance the performance of Medical Vision Language Model (MedVLM) by adjusting model structure, fine-tuning with high-quality data, or through preference fine-tuning.<n>We propose an expert-in-the-loop framework named Expert-Controlled-Free Guidance (Expert-CFG) to align MedVLM with clinical expertise without additional training.
arXiv Detail & Related papers (2025-07-12T09:03:30Z) - EchoQA: A Large Collection of Instruction Tuning Data for Echocardiogram Reports [0.0]
We introduce a novel question-answering (QA) dataset using echocardiogram reports sourced from the Medical Information Mart for Intensive Care database.<n>This dataset is specifically designed to enhance QA systems in cardiology, consisting of 771,244 QA pairs addressing a wide array of cardiac abnormalities and their severity.<n>We compare large language models (LLMs), including open-source and biomedical-specific models for zero-shot evaluation, and closed-source models for zero-shot and three-shot evaluation.
arXiv Detail & Related papers (2025-03-04T07:45:45Z) - Novel Development of LLM Driven mCODE Data Model for Improved Clinical Trial Matching to Enable Standardization and Interoperability in Oncology Research [0.15346678870160887]
Cancer costs reaching over $208 billion in 2023 alone.
Traditional methods regarding clinical trial enrollment and clinical care in oncology are often manual, time-consuming, and lack a data-driven approach.
This paper presents a novel framework to streamline standardization, interoperability, and exchange of cancer domains.
arXiv Detail & Related papers (2024-10-18T17:31:35Z) - Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval [61.70489848327436]
KARE is a novel framework that integrates knowledge graph (KG) community-level retrieval with large language models (LLMs) reasoning.<n>Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions.
arXiv Detail & Related papers (2024-10-06T18:46:28Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.