Related papers: From Questions to Clinical Recommendations: Large Language Models Driving Evidence-Based Clinical Decision Making

From Questions to Clinical Recommendations: Large Language Models Driving Evidence-Based Clinical Decision Making

URL: http://arxiv.org/abs/2505.10282v1
Date: Thu, 15 May 2025 13:30:39 GMT
Title: From Questions to Clinical Recommendations: Large Language Models Driving Evidence-Based Clinical Decision Making
Authors: Dubai Li, Nan Jiang, Kangping Huang, Ruiqi Tu, Shuyu Ouyang, Huayu Yu, Lin Qiao, Chen Yu, Tianshu Zhou, Danyang Tong, Qian Wang, Mengtao Li, Xiaofeng Zeng, Yu Tian, Xinping Tian, Jingsong Li,
Abstract summary: Quicker is an evidence-based clinical decision support system powered by large language models (LLMs)<n>This study introduces Quicker, an evidence-based clinical decision support system powered by large language models (LLMs)
Score: 14.48848307485775
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Clinical evidence, derived from rigorous research and data analysis, provides healthcare professionals with reliable scientific foundations for informed decision-making. Integrating clinical evidence into real-time practice is challenging due to the enormous workload, complex professional processes, and time constraints. This highlights the need for tools that automate evidence synthesis to support more efficient and accurate decision making in clinical settings. This study introduces Quicker, an evidence-based clinical decision support system powered by large language models (LLMs), designed to automate evidence synthesis and generate clinical recommendations modeled after standard clinical guideline development processes. Quicker implements a fully automated chain that covers all phases, from questions to clinical recommendations, and further enables customized decision-making through integrated tools and interactive user interfaces. To evaluate Quicker's capabilities, we developed the Q2CRBench-3 benchmark dataset, based on clinical guideline development records for three different diseases. Experimental results highlighted Quicker's strong performance, with fine-grained question decomposition tailored to user preferences, retrieval sensitivities comparable to human experts, and literature screening performance approaching comprehensive inclusion of relevant studies. In addition, Quicker-assisted evidence assessment effectively supported human reviewers, while Quicker's recommendations were more comprehensive and logically coherent than those of clinicians. In system-level testing, collaboration between a single reviewer and Quicker reduced the time required for recommendation development to 20-40 minutes. In general, our findings affirm the potential of Quicker to help physicians make quicker and more reliable evidence-based clinical decisions.

Related papers

Dr. Assistant: Enhancing Clinical Diagnostic Inquiry via Structured Diagnostic Reasoning Data and Reinforcement Learning [14.866862028323053]
Large Language Models (LLMs) have been widely adopted in healthcare due to their extensive knowledge reserves, retrieval, and communication capabilities.<n>LLMs show promise and excel at medical benchmarks, their diagnostic reasoning and inquiry skills are constrained.<n>We propose (1) Clinical Diagnostic Reasoning Data (CDRD) structure to capture abstract clinical reasoning logic, and a pipeline for its construction, and (2) the Dr. Assistant, a clinical diagnostic model equipped with clinical reasoning and inquiry skills.
arXiv Detail & Related papers (2026-01-20T07:43:57Z)
EHRNavigator: A Multi-Agent System for Patient-Level Clinical Question Answering over Heterogeneous Electronic Health Records [31.559633376006442]
EHRNavigator is a multi-agent framework that harnesses AI agents to perform patient-level question answering across heterogeneous and multimodal EHR data.<n>It achieved 86% accuracy on real-world cases while maintaining clinically acceptable response times.
arXiv Detail & Related papers (2026-01-15T03:02:15Z)
Human-AI Co-design for Clinical Prediction Models [8.099095493749868]
We introduce HACHI, an iterative human-in-the-loop framework that uses AI agents to accelerate the development of fully interpretable clinical prediction models.<n>In two real-world prediction tasks, HACHI outperforms existing approaches, surfaces new clinically relevant concepts, and improves model generalizability across clinical sites and time periods.
arXiv Detail & Related papers (2026-01-14T02:04:34Z)
ClinDEF: A Dynamic Evaluation Framework for Large Language Models in Clinical Reasoning [58.01333341218153]
We propose ClinDEF, a dynamic framework for assessing clinical reasoning in LLMs through simulated diagnostic dialogues.<n>Our method generates patient cases and facilitates multi-turn interactions between an LLM-based doctor and an automated patient agent.<n>Experiments show that ClinDEF effectively exposes critical clinical reasoning gaps in state-of-the-art LLMs.
arXiv Detail & Related papers (2025-12-29T12:58:58Z)
Timely Clinical Diagnosis through Active Test Selection [49.091903570068155]
We propose ACTMED (Adaptive Clinical Test selection via Model-based Experimental Design) to better emulate real-world diagnostic reasoning.<n>LLMs act as flexible simulators, generating plausible patient state distributions and supporting belief updates without requiring structured, task-specific training data.<n>We evaluate ACTMED on real-world datasets and show it can optimize test selection to improve diagnostic accuracy, interpretability, and resource use.
arXiv Detail & Related papers (2025-10-21T18:10:45Z)
Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models [51.91760712805404]
We introduce VivaBench, a benchmark for evaluating sequential clinical reasoning in large language models (LLMs)<n>Our dataset consists of 1762 physician-curated clinical vignettes structured as interactive scenarios that simulate a (oral) examination in medical training.<n>Our analysis identified several failure modes that mirror common cognitive errors in clinical practice.
arXiv Detail & Related papers (2025-10-11T16:24:35Z)
Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications [59.721265428780946]
Large Language Models (LLMs) in medicine have enabled impressive capabilities, yet a critical gap remains in their ability to perform systematic, transparent, and verifiable reasoning.<n>This paper provides the first systematic review of this emerging field.<n>We propose a taxonomy of reasoning enhancement techniques, categorized into training-time strategies and test-time mechanisms.
arXiv Detail & Related papers (2025-08-01T14:41:31Z)
Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases [48.87360916431396]
We introduce MedR-Bench, a benchmarking dataset of 1,453 structured patient cases, annotated with reasoning references.<n>We propose a framework encompassing three critical examination recommendation, diagnostic decision-making, and treatment planning, simulating the entire patient care journey.<n>Using this benchmark, we evaluate five state-of-the-art reasoning LLMs, including DeepSeek-R1, OpenAI-o3-mini, and Gemini-2.0-Flash Thinking, etc.
arXiv Detail & Related papers (2025-03-06T18:35:39Z)
Expertise Is What We Want [0.0]
We share an application architecture, the Large Language Expert (LLE), that combines the flexibility and power of Large Language Models (LLMs) with the interpretability, explainability, and reliability of Expert Systems.<n>To highlight the power of the Large Language Expert (LLE) system, we built an LLE to assist with the workup of patients newly diagnosed with cancer.
arXiv Detail & Related papers (2025-02-27T18:05:15Z)
VIEWER: an extensible visual analytics framework for enhancing mental healthcare [2.486545071109632]
VIEWER is an open-source toolkit that employs natural language processing and interactive visualisation techniques.<n>It was designed and implemented in one of the UK's largest NHS mental health Trusts.<n> VIEWER significantly improved performance and task completion speed compared to the standard clinical information system.
arXiv Detail & Related papers (2024-10-25T14:01:13Z)
TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets [54.98321887435557]
This paper presents a suite of 23 meticulously curated AI-ready datasets covering multi-modal input features and 8 crucial prediction challenges in clinical trial design.<n>We provide basic validation methods for each task to ensure the datasets' usability and reliability.<n>We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design.
arXiv Detail & Related papers (2024-06-30T09:13:10Z)
Large Language Models in the Clinic: A Comprehensive Benchmark [63.21278434331952]
We build a benchmark ClinicBench to better understand large language models (LLMs) in the clinic. We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks. We then construct six novel datasets and clinical tasks that are complex but common in real-world practice. We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings.
arXiv Detail & Related papers (2024-04-25T15:51:06Z)
ClinicalAgent: Clinical Trial Multi-Agent System with Large Language Model-based Reasoning [16.04933261211837]
Large Language Models (LLMs) and multi-agent systems have shown impressive capabilities in natural language tasks but face challenges in clinical trial applications. We introduce Clinical Agent System (ClinicalAgent), a clinical multi-agent system designed for clinical trial tasks.
arXiv Detail & Related papers (2024-04-23T06:30:53Z)
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs. This setup allows for realistic assessments of LLMs in clinical scenarios. We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z)
Natural Language Programming in Medicine: Administering Evidence Based Clinical Workflows with Autonomous Agents Powered by Generative Large Language Models [29.05425041393475]
Generative Large Language Models (LLMs) hold significant promise in healthcare. This study assessed the potential of LLMs to function as autonomous agents in a simulated tertiary care medical center.
arXiv Detail & Related papers (2024-01-05T15:09:57Z)
Emulating Human Cognitive Processes for Expert-Level Medical Question-Answering with Large Language Models [0.23463422965432823]
BooksMed is a novel framework based on a Large Language Model (LLM) It emulates human cognitive processes to deliver evidence-based and reliable responses. We present ExpertMedQA, a benchmark comprised of open-ended, expert-level clinical questions.
arXiv Detail & Related papers (2023-10-17T13:39:26Z)
Utilizing ChatGPT to Enhance Clinical Trial Enrollment [2.3551878971309947]
We propose an automated approach that leverages ChatGPT, a large language model, to extract patient-related information from unstructured clinical notes. Our empirical evaluation, conducted on two benchmark retrieval collections, shows improved retrieval performance compared to existing approaches. These findings highlight the potential use of ChatGPT to enhance clinical trial enrollment while ensuring the quality of medical service and minimizing direct risks to patients.
arXiv Detail & Related papers (2023-06-03T10:54:23Z)
AutoTrial: Prompting Language Models for Clinical Trial Design [53.630479619856516]
We present a method named AutoTrial to aid the design of clinical eligibility criteria using language models. Experiments on over 70K clinical trials verify that AutoTrial generates high-quality criteria texts.
arXiv Detail & Related papers (2023-05-19T01:04:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.