Related papers: DocCHA: Towards LLM-Augmented Interactive Online diagnosis System

DocCHA: Towards LLM-Augmented Interactive Online diagnosis System

URL: http://arxiv.org/abs/2507.07870v1
Date: Thu, 10 Jul 2025 15:52:04 GMT
Title: DocCHA: Towards LLM-Augmented Interactive Online diagnosis System
Authors: Xinyi Liu, Dachun Sun, Yi R. Fung, Dilek Hakkani-Tür, Tarek Abdelzaher,
Abstract summary: DocCHA is a confidence-aware, modular framework that emulates clinical reasoning by decomposing the diagnostic process into three stages.<n> evaluated on two real-world Chinese consultation datasets.
Score: 17.975659876934895
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the impressive capabilities of Large Language Models (LLMs), existing Conversational Health Agents (CHAs) remain static and brittle, incapable of adaptive multi-turn reasoning, symptom clarification, or transparent decision-making. This hinders their real-world applicability in clinical diagnosis, where iterative and structured dialogue is essential. We propose DocCHA, a confidence-aware, modular framework that emulates clinical reasoning by decomposing the diagnostic process into three stages: (1) symptom elicitation, (2) history acquisition, and (3) causal graph construction. Each module uses interpretable confidence scores to guide adaptive questioning, prioritize informative clarifications, and refine weak reasoning links. Evaluated on two real-world Chinese consultation datasets (IMCS21, DX), DocCHA consistently outperforms strong prompting-based LLM baselines (GPT-3.5, GPT-4o, LLaMA-3), achieving up to 5.18 percent higher diagnostic accuracy and over 30 percent improvement in symptom recall, with only modest increase in dialogue turns. These results demonstrate the effectiveness of DocCHA in enabling structured, transparent, and efficient diagnostic conversations -- paving the way for trustworthy LLM-powered clinical assistants in multilingual and resource-constrained settings.

Related papers

Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models [52.2001050216955]
Existing methods aim to enhance the performance of Medical Vision Language Model (MedVLM) by adjusting model structure, fine-tuning with high-quality data, or through preference fine-tuning.<n>We propose an expert-in-the-loop framework named Expert-Controlled-Free Guidance (Expert-CFG) to align MedVLM with clinical expertise without additional training.
arXiv Detail & Related papers (2025-07-12T09:03:30Z)
Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making [80.94208848596215]
We present a new concept called Catfish Agent, a role-specialized LLM designed to inject structured dissent and counter silent agreement.<n>Inspired by the catfish effect'' in organizational psychology, the Catfish Agent is designed to challenge emerging consensus to stimulate deeper reasoning.
arXiv Detail & Related papers (2025-05-27T17:59:50Z)
3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark [0.29987253996125257]
3MDBench is an open-source framework for simulating and evaluating LVLM-driven telemedical consultations.<n> multimodal dialogue with internal reasoning improves F1 score by 6.5% over non-dialogue settings.<n> injecting predictions from a diagnostic convolutional network into the LVLM's context boosts F1 by up to 20%.
arXiv Detail & Related papers (2025-03-26T07:32:05Z)
Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases [48.87360916431396]
We introduce MedR-Bench, a benchmarking dataset of 1,453 structured patient cases, annotated with reasoning references.<n>We propose a framework encompassing three critical examination recommendation, diagnostic decision-making, and treatment planning, simulating the entire patient care journey.<n>Using this benchmark, we evaluate five state-of-the-art reasoning LLMs, including DeepSeek-R1, OpenAI-o3-mini, and Gemini-2.0-Flash Thinking, etc.
arXiv Detail & Related papers (2025-03-06T18:35:39Z)
Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.<n>We propose a novel approach utilizing structured medical reasoning.<n>Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z)
The Reliability of LLMs for Medical Diagnosis: An Examination of Consistency, Manipulation, and Contextual Awareness [0.0]
Large Language Models (LLMs) offer promise for democratizing healthcare with advanced diagnostics.<n>This study assesses their diagnostic reliability focusing on consistency, manipulation resilience, and contextual integration.<n>LLMs' vulnerability to manipulation and limited contextual awareness pose challenges in clinical use.
arXiv Detail & Related papers (2025-03-02T11:50:16Z)
SemioLLM: Evaluating Large Language Models for Diagnostic Reasoning from Unstructured Clinical Narratives in Epilepsy [45.2233252981348]
Large Language Models (LLMs) have been shown to encode clinical knowledge.<n>We present SemioLLM, an evaluation framework that benchmarks 6 state-of-the-art models.<n>We show that most LLMs are able to accurately and confidently generate probabilistic predictions of seizure onset zones in the brain.
arXiv Detail & Related papers (2024-07-03T11:02:12Z)
MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning [36.400896909161006]
We develop systems that proactively ask questions to gather more information and respond reliably. We introduce a benchmark - MediQ - to evaluate question-asking ability in LLMs.
arXiv Detail & Related papers (2024-06-03T01:32:52Z)
Beyond Self-Consistency: Ensemble Reasoning Boosts Consistency and Accuracy of LLMs in Cancer Staging [0.33554367023486936]
Cancer staging status is available in clinical reports, but it requires natural language processing to extract it. With the advance in clinical-oriented large language models, it is promising to extract such status without extensive efforts in training the algorithms. In this study, we propose an ensemble reasoning approach with the aim of improving the consistency of the model generations.
arXiv Detail & Related papers (2024-04-19T19:34:35Z)
KNSE: A Knowledge-aware Natural Language Inference Framework for Dialogue Symptom Status Recognition [69.78432481474572]
We propose a novel framework called KNSE for symptom status recognition (SSR) For each mentioned symptom in a dialogue window, we first generate knowledge about the symptom and hypothesis about status of the symptom, to form a (premise, knowledge, hypothesis) triplet. The BERT model is then used to encode the triplet, which is further processed by modules including utterance aggregation, self-attention, cross-attention, and GRU to predict the symptom status.
arXiv Detail & Related papers (2023-05-26T11:23:26Z)
Clinical Camel: An Open Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding [31.884600238089405]
We present Clinical Camel, an open large language model (LLM) explicitly tailored for clinical research. Fine-tuned from LLaMA-2 using QLoRA, Clinical Camel achieves state-of-the-art performance across medical benchmarks among openly available medical LLMs.
arXiv Detail & Related papers (2023-05-19T23:07:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.