Related papers: CPGPrompt: Translating Clinical Guidelines into LLM-Executable Decision Support

CPGPrompt: Translating Clinical Guidelines into LLM-Executable Decision Support

URL: http://arxiv.org/abs/2601.03475v1
Date: Wed, 07 Jan 2026 00:05:42 GMT
Title: CPGPrompt: Translating Clinical Guidelines into LLM-Executable Decision Support
Authors: Ruiqi Deng, Geoffrey Martin, Tony Wang, Gongbo Zhang, Yi Liu, Chunhua Weng, Yanshan Wang, Justin F Rousseau, Yifan Peng,
Abstract summary: We develop and validate CPGPrompt, an auto-prompting system that converts narrative clinical guidelines into large language models (LLMs)<n>Our framework translates CPGs into structured decision trees and utilizes an LLM to dynamically navigate them for patient case evaluation.<n>System performance was assessed on both binary specialty-referral decisions and fine-grained pathway-classification tasks.
Score: 18.887576751340884
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Clinical practice guidelines (CPGs) provide evidence-based recommendations for patient care; however, integrating them into Artificial Intelligence (AI) remains challenging. Previous approaches, such as rule-based systems, face significant limitations, including poor interpretability, inconsistent adherence to guidelines, and narrow domain applicability. To address this, we develop and validate CPGPrompt, an auto-prompting system that converts narrative clinical guidelines into large language models (LLMs). Our framework translates CPGs into structured decision trees and utilizes an LLM to dynamically navigate them for patient case evaluation. Synthetic vignettes were generated across three domains (headache, lower back pain, and prostate cancer) and distributed into four categories to test different decision scenarios. System performance was assessed on both binary specialty-referral decisions and fine-grained pathway-classification tasks. The binary specialty referral classification achieved consistently strong performance across all domains (F1: 0.85-1.00), with high recall (1.00 $\pm$ 0.00). In contrast, multi-class pathway assignment showed reduced performance, with domain-specific variations: headache (F1: 0.47), lower back pain (F1: 0.72), and prostate cancer (F1: 0.77). Domain-specific performance differences reflected the structure of each guideline. The headache guideline highlighted challenges with negation handling. The lower back pain guideline required temporal reasoning. In contrast, prostate cancer pathways benefited from quantifiable laboratory tests, resulting in more reliable decision-making.

Related papers

MMedExpert-R1: Strengthening Multimodal Medical Reasoning via Domain-Specific Adaptation and Clinical Guideline Reinforcement [63.82954136824963]
Medical Vision-Language Models excel at perception tasks with complex clinical reasoning required in real-world scenarios.<n>We propose a novel reasoning MedVLM that addresses these challenges through domain-specific adaptation and guideline reinforcement.
arXiv Detail & Related papers (2026-01-16T02:32:07Z)
Orchestrator Multi-Agent Clinical Decision Support System for Secondary Headache Diagnosis in Primary Care [13.520457515792534]
We present a large language model (LLM)-based multi-agent clinical decision support system built on an orchestrator-specialist architecture.<n>The system decomposes diagnosis into seven domain-specialized agents, each producing a structured and evidence-grounded rationale.<n>We evaluated the multi-agent system using 90 expert-validated secondary headache cases and compared its performance with a single-LLM baseline.
arXiv Detail & Related papers (2025-12-03T19:26:12Z)
A Locally Executable AI System for Improving Preoperative Patient Communication: A Multi-Domain Clinical Evaluation [1.9205944025326396]
LENOHA is a safety-first, local-first system that routes inputs with a high-precision sentence-transformer constraints.<n>It returns verbatim answers from a clinician-curated FAQ for clinical queries.<n>Energy logging shows that the non-generative clinical path consumes 1.0 mWh per input versus 168 mWh per small-talk reply.
arXiv Detail & Related papers (2025-10-02T04:53:11Z)
Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning [53.45095336430027]
We develop a unified framework that combines implicit retrieval and structured collaboration.<n>On Humanity's Last Exam (HLE) Bio/Chem Gold, our framework achieves 48.3% accuracy.<n>Results on SuperGPQA and TRQA confirm robustness across domains.
arXiv Detail & Related papers (2025-09-25T14:05:55Z)
Design and Validation of a Responsible Artificial Intelligence-based System for the Referral of Diabetic Retinopathy Patients [65.57160385098935]
Early detection of Diabetic Retinopathy can reduce the risk of vision loss by up to 95%.<n>We developed RAIS-DR, a Responsible AI System for DR screening that incorporates ethical principles across the AI lifecycle.<n>We evaluated RAIS-DR against the FDA-approved EyeArt system on a local dataset of 1,046 patients, unseen by both systems.
arXiv Detail & Related papers (2025-08-17T21:54:11Z)
LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer [12.795639054336226]
preoperative assessment of lymph node metastasis in rectal cancer guides treatment decisions.<n>Some artificial intelligence models operate as black boxes, lacking the interpretability needed for clinical trust.<n>We introduce LRMR, an LLM-Driven Multi-node Ranking framework.
arXiv Detail & Related papers (2025-07-15T16:29:45Z)
Beyond the LUMIR challenge: The pathway to foundational registration models [25.05315856123745]
The Large-scale Unsupervised Brain MRI Image Registration (LUMIR) challenge is a next-generation benchmark designed to assess and advance unsupervised brain MRI registration.<n>LUMIR provides over 4,000 preprocessed T1-weighted brain MRIs for training without any label maps, encouraging biologically plausible deformation modeling.<n>A total of 1,158 subjects and over 4,000 image pairs were included for evaluation.
arXiv Detail & Related papers (2025-05-30T03:07:58Z)
AGIR: Assessing 3D Gait Impairment with Reasoning based on LLMs [0.0]
gait impairment plays an important role in early diagnosis, disease monitoring, and treatment evaluation for neurodegenerative diseases.<n>Recent deep learning-based approaches have consistently improved classification accuracies, but they often lack interpretability.<n>We introduce AGIR, a novel pipeline consisting of a pre-trained VQ-VAE motion tokenizer and a Large Language Model (LLM) fine-tuned over pairs of motion tokens.
arXiv Detail & Related papers (2025-03-23T17:12:16Z)
Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases [48.87360916431396]
We introduce MedR-Bench, a benchmarking dataset of 1,453 structured patient cases, annotated with reasoning references.<n>We propose a framework encompassing three critical examination recommendation, diagnostic decision-making, and treatment planning, simulating the entire patient care journey.<n>Using this benchmark, we evaluate five state-of-the-art reasoning LLMs, including DeepSeek-R1, OpenAI-o3-mini, and Gemini-2.0-Flash Thinking, etc.
arXiv Detail & Related papers (2025-03-06T18:35:39Z)
Taxonomy Adaptive Cross-Domain Adaptation in Medical Imaging via Optimization Trajectory Distillation [73.83178465971552]
The success of automated medical image analysis depends on large-scale and expert-annotated training sets. Unsupervised domain adaptation (UDA) has been raised as a promising approach to alleviate the burden of labeled data collection. We propose optimization trajectory distillation, a unified approach to address the two technical challenges from a new perspective.
arXiv Detail & Related papers (2023-07-27T08:58:05Z)
Performance of Dual-Augmented Lagrangian Method and Common Spatial Patterns applied in classification of Motor-Imagery BCI [68.8204255655161]
Motor-imagery based brain-computer interfaces (MI-BCI) have the potential to become ground-breaking technologies for neurorehabilitation. Due to the noisy nature of the used EEG signal, reliable BCI systems require specialized procedures for features optimization and extraction.
arXiv Detail & Related papers (2020-10-13T20:50:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.