Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering
- URL: http://arxiv.org/abs/2403.04890v3
- Date: Tue, 15 Oct 2024 21:03:11 GMT
- Title: Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering
- Authors: Saeel Sandeep Nachane, Ojas Gramopadhye, Prateek Chanda, Ganesh Ramakrishnan, Kshitij Sharad Jadhav, Yatin Nandwani, Dinesh Raghu, Sachindra Joshi,
- Abstract summary: We propose a modified version of the MedQA-USMLE dataset, named MEDQA-OPEN.
It contains open-ended medical questions without options to mimic clinical scenarios, along with clinician-approved reasoned answers.
We implement a prompt driven by Chain of Thought (CoT) reasoning, CLINICR, to mirror the prospective process of incremental reasoning.
- Score: 24.43605359639671
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a modified version of the MedQA-USMLE dataset, named MEDQA-OPEN, which contains open-ended medical questions without options to mimic clinical scenarios, along with clinician-approved reasoned answers. Additionally, we implement a prompt driven by Chain of Thought (CoT) reasoning, CLINICR, to mirror the prospective process of incremental reasoning, reaching a correct response to medical questions. We empirically demonstrate how CLINICR outperforms the state-of-the-art 5-shot CoT-based prompt (Li\'evin et al., 2022). We also present an approach that mirrors real-life clinical practice by first exploring multiple differential diagnoses through MCQ-CLINICR and subsequently narrowing down to a final diagnosis using MCQ-ELIMINATIVE. Finally, emphasizing the importance of response verification in medical settings, we utilize a reward model mechanism, replacing the elimination process performed by MCQ-ELIMINATIVE.
Related papers
- Enhancing Step-by-Step and Verifiable Medical Reasoning in MLLMs [23.50838763761289]
We propose Mentor-Intern Collaborative Search (MICS) to generate rigorous and effective medical chain-of-thought data.<n>The reasoning performance is determined by an MICS-Score, which assesses the quality of generated reasoning paths.<n>Eventually, we construct MMRP, a multi-task medical reasoning dataset with ranked difficulty, and Chiron-o1, a new medical MLLM devised via a curriculum learning strategy.
arXiv Detail & Related papers (2025-06-20T12:51:19Z) - Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning [38.49879425944787]
We propose to model clinical decision-making for diagnosis with a hypothesis-driven uncertainty-aware language agent, LA-CDM.<n>We train LA-CDM with three objectives targeting critical aspects of clinical decision-making: accurate hypothesis generation, hypothesis uncertainty estimation, and efficient decision-making.<n>We evaluate our methodology on MIMIC-CDM, a real-world dataset covering four abdominal diseases.
arXiv Detail & Related papers (2025-06-16T13:32:01Z) - MAGI: Multi-Agent Guided Interview for Psychiatric Assessment [50.6150986786028]
We present MAGI, the first framework that transforms the gold-standard Mini International Neuropsychiatric Interview (MINI) into automatic computational navigation.
We show that MAGI advances LLM- assisted mental health assessment by combining clinical rigor, conversational adaptability, and explainable reasoning.
arXiv Detail & Related papers (2025-04-25T11:08:27Z) - Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.
We propose a novel approach utilizing structured medical reasoning.
Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - ClinKD: Cross-Modal Clinical Knowledge Distiller For Multi-Task Medical Images [4.353855760968461]
Cross-Modal Clinical Knowledge Distiller (ClinKD) designed to enhance image-text alignment and establish more effective medical knowledge adaptation mechanisms.
ClinKD achieves state-of-the-art performance on the Med-GRIT-270k dataset, a challenging medical benchmark containing fine-grained multi-task QA pairs.
arXiv Detail & Related papers (2025-02-09T15:08:10Z) - MedCoT: Medical Chain of Thought via Hierarchical Expert [48.91966620985221]
This paper presents MedCoT, a novel hierarchical expert verification reasoning chain method.
It is designed to enhance interpretability and accuracy in biomedical imaging inquiries.
Experimental evaluations on four standard Med-VQA datasets demonstrate that MedCoT surpasses existing state-of-the-art approaches.
arXiv Detail & Related papers (2024-12-18T11:14:02Z) - Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking [58.25862290294702]
We present MedChain, a dataset of 12,163 clinical cases that covers five key stages of clinical workflow.
We also propose MedChain-Agent, an AI system that integrates a feedback mechanism and a MCase-RAG module to learn from previous cases and adapt its responses.
arXiv Detail & Related papers (2024-12-02T15:25:02Z) - Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs)
We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets.
Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z) - Large Language Models in the Clinic: A Comprehensive Benchmark [63.21278434331952]
We build a benchmark ClinicBench to better understand large language models (LLMs) in the clinic.
We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks.
We then construct six novel datasets and clinical tasks that are complex but common in real-world practice.
We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings.
arXiv Detail & Related papers (2024-04-25T15:51:06Z) - IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task On the Shoulders of Medical Agents [0.0]
This paper presents MedReAct'N'MedReFlex, which leverages a suite of four medical agents to detect and correct errors in clinical notes.
One core component of our method is our RAG pipeline based on our ClinicalCorp corpora.
Our results demonstrate the central role of our RAG approach with ClinicalCorp leveraged through the MedReAct'N'MedReFlex framework.
arXiv Detail & Related papers (2024-04-23T20:00:37Z) - ArgMed-Agents: Explainable Clinical Decision Reasoning with LLM Disscusion via Argumentation Schemes [7.950883198425716]
ArgMed-Agents is a framework to enable large language models (LLMs) to make explainable clinical decision reasoning through interaction.
We construct a formal model of ArgMed-Agents and present conjectures for theoretical guarantees.
setup experiments show that ArgMed-Agents not only improves accuracy in complex clinical decision reasoning problems compared to other prompt methods, but more importantly, it provides users with decision explanations that increase their confidence.
arXiv Detail & Related papers (2024-03-10T19:47:00Z) - Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions [19.436999992810797]
We construct two new datasets: JAMA Clinical Challenge and Medbullets.
JAMA Clinical Challenge consists of questions based on challenging clinical cases, while Medbullets comprises simulated clinical questions.
We evaluate seven LLMs on the two datasets using various prompts.
arXiv Detail & Related papers (2024-02-28T05:44:41Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales [15.362903610463285]
We present a "reasoning-aware" diagnosis framework that rationalizes the diagnostic process via prompt-based learning.
We propose a novel set of criteria for evaluating machine-generated rationales' potential for real-world clinical settings.
arXiv Detail & Related papers (2023-12-12T16:14:45Z) - CLIP in Medical Imaging: A Comprehensive Survey [59.429714742927956]
Contrastive Language-Image Pre-training successfully introduces text supervision to vision models.
It has shown promising results across various tasks, attributable to its generalizability and interpretability.
Use of CLIP has recently gained increasing interest in the medical imaging domain.
arXiv Detail & Related papers (2023-12-12T15:21:57Z) - Diagnostic Reasoning Prompts Reveal the Potential for Large Language
Model Interpretability in Medicine [4.773117448586697]
We develop novel diagnostic reasoning prompts to study whether large language models (LLMs) can perform clinical reasoning to accurately form a diagnosis.
We find GPT4 can be prompted to mimic the common clinical reasoning processes of clinicians without sacrificing diagnostic accuracy.
arXiv Detail & Related papers (2023-08-13T19:04:07Z) - VBridge: Connecting the Dots Between Features, Explanations, and Data
for Healthcare Models [85.4333256782337]
VBridge is a visual analytics tool that seamlessly incorporates machine learning explanations into clinicians' decision-making workflow.
We identified three key challenges, including clinicians' unfamiliarity with ML features, lack of contextual information, and the need for cohort-level evidence.
We demonstrated the effectiveness of VBridge through two case studies and expert interviews with four clinicians.
arXiv Detail & Related papers (2021-08-04T17:34:13Z) - Inheritance-guided Hierarchical Assignment for Clinical Automatic
Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making.
We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.