Related papers: Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering

Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering

URL: http://arxiv.org/abs/2403.04890v1
Date: Thu, 7 Mar 2024 20:48:40 GMT
Title: Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering
Authors: Ojas Gramopadhye, Saeel Sandeep Nachane, Prateek Chanda, Ganesh Ramakrishnan, Kshitij Sharad Jadhav, Yatin Nandwani, Dinesh Raghu, Sachindra Joshi
Abstract summary: We propose a modified version of the MedQA-USMLE dataset, which is subjective, to mimic real-life clinical scenarios. We develop better in-contrast learning strategies by modifying the 5-shot-codex-CoT-prompt from arXiv:2207.08143 for the subjective MedQA dataset and developing our incremental-reasoning prompt.
Score: 25.163347677278182
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language models (LLMs) have demonstrated significant potential in transforming healthcare by automating tasks such as clinical documentation, information retrieval, and decision support. In this aspect, carefully engineered prompts have emerged as a powerful tool for using LLMs for medical scenarios, e.g., patient clinical scenarios. In this paper, we propose a modified version of the MedQA-USMLE dataset, which is subjective, to mimic real-life clinical scenarios. We explore the Chain of Thought (CoT) reasoning based on subjective response generation for the modified MedQA-USMLE dataset with appropriate LM-driven forward reasoning for correct responses to the medical questions. Keeping in mind the importance of response verification in the medical setting, we utilize a reward training mechanism whereby the language model also provides an appropriate verified response for a particular response to a clinical question. In this regard, we also include human-in-the-loop for different evaluation aspects. We develop better in-contrast learning strategies by modifying the 5-shot-codex-CoT-prompt from arXiv:2207.08143 for the subjective MedQA dataset and developing our incremental-reasoning prompt. Our evaluations show that the incremental reasoning prompt performs better than the modified codex prompt in certain scenarios. We also show that greedy decoding with the incremental reasoning method performs better than other strategies, such as prompt chaining and eliminative reasoning.

Related papers

Enhancing Step-by-Step and Verifiable Medical Reasoning in MLLMs [23.50838763761289]
We propose Mentor-Intern Collaborative Search (MICS) to generate rigorous and effective medical chain-of-thought data.<n>The reasoning performance is determined by an MICS-Score, which assesses the quality of generated reasoning paths.<n>Eventually, we construct MMRP, a multi-task medical reasoning dataset with ranked difficulty, and Chiron-o1, a new medical MLLM devised via a curriculum learning strategy.
arXiv Detail & Related papers (2025-06-20T12:51:19Z)
Language Agents for Hypothesis-driven Clinical Decision Making with Reinforcement Learning [38.49879425944787]
We propose to model clinical decision-making for diagnosis with a hypothesis-driven uncertainty-aware language agent, LA-CDM.<n>We train LA-CDM with three objectives targeting critical aspects of clinical decision-making: accurate hypothesis generation, hypothesis uncertainty estimation, and efficient decision-making.<n>We evaluate our methodology on MIMIC-CDM, a real-world dataset covering four abdominal diseases.
arXiv Detail & Related papers (2025-06-16T13:32:01Z)
MAGI: Multi-Agent Guided Interview for Psychiatric Assessment [50.6150986786028]
We present MAGI, the first framework that transforms the gold-standard Mini International Neuropsychiatric Interview (MINI) into automatic computational navigation. We show that MAGI advances LLM- assisted mental health assessment by combining clinical rigor, conversational adaptability, and explainable reasoning.
arXiv Detail & Related papers (2025-04-25T11:08:27Z)
Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions. We propose a novel approach utilizing structured medical reasoning. Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z)
ClinKD: Cross-Modal Clinical Knowledge Distiller For Multi-Task Medical Images [4.353855760968461]
Cross-Modal Clinical Knowledge Distiller (ClinKD) designed to enhance image-text alignment and establish more effective medical knowledge adaptation mechanisms. ClinKD achieves state-of-the-art performance on the Med-GRIT-270k dataset, a challenging medical benchmark containing fine-grained multi-task QA pairs.
arXiv Detail & Related papers (2025-02-09T15:08:10Z)
MedCoT: Medical Chain of Thought via Hierarchical Expert [48.91966620985221]
This paper presents MedCoT, a novel hierarchical expert verification reasoning chain method. It is designed to enhance interpretability and accuracy in biomedical imaging inquiries. Experimental evaluations on four standard Med-VQA datasets demonstrate that MedCoT surpasses existing state-of-the-art approaches.
arXiv Detail & Related papers (2024-12-18T11:14:02Z)
Medchain: Bridging the Gap Between LLM Agents and Clinical Practice through Interactive Sequential Benchmarking [58.25862290294702]
We present MedChain, a dataset of 12,163 clinical cases that covers five key stages of clinical workflow. We also propose MedChain-Agent, an AI system that integrates a feedback mechanism and a MCase-RAG module to learn from previous cases and adapt its responses.
arXiv Detail & Related papers (2024-12-02T15:25:02Z)
Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs) We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets. Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z)
Large Language Models in the Clinic: A Comprehensive Benchmark [63.21278434331952]
We build a benchmark ClinicBench to better understand large language models (LLMs) in the clinic. We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks. We then construct six novel datasets and clinical tasks that are complex but common in real-world practice. We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings.
arXiv Detail & Related papers (2024-04-25T15:51:06Z)
IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task On the Shoulders of Medical Agents [0.0]
This paper presents MedReAct'N'MedReFlex, which leverages a suite of four medical agents to detect and correct errors in clinical notes. One core component of our method is our RAG pipeline based on our ClinicalCorp corpora. Our results demonstrate the central role of our RAG approach with ClinicalCorp leveraged through the MedReAct'N'MedReFlex framework.
arXiv Detail & Related papers (2024-04-23T20:00:37Z)
ArgMed-Agents: Explainable Clinical Decision Reasoning with LLM Disscusion via Argumentation Schemes [7.950883198425716]
ArgMed-Agents is a framework to enable large language models (LLMs) to make explainable clinical decision reasoning through interaction. We construct a formal model of ArgMed-Agents and present conjectures for theoretical guarantees. setup experiments show that ArgMed-Agents not only improves accuracy in complex clinical decision reasoning problems compared to other prompt methods, but more importantly, it provides users with decision explanations that increase their confidence.
arXiv Detail & Related papers (2024-03-10T19:47:00Z)
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions [19.436999992810797]
We construct two new datasets: JAMA Clinical Challenge and Medbullets. JAMA Clinical Challenge consists of questions based on challenging clinical cases, while Medbullets comprises simulated clinical questions. We evaluate seven LLMs on the two datasets using various prompts.
arXiv Detail & Related papers (2024-02-28T05:44:41Z)
AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs. This setup allows for realistic assessments of LLMs in clinical scenarios. We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z)
Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales [15.362903610463285]
We present a "reasoning-aware" diagnosis framework that rationalizes the diagnostic process via prompt-based learning. We propose a novel set of criteria for evaluating machine-generated rationales' potential for real-world clinical settings.
arXiv Detail & Related papers (2023-12-12T16:14:45Z)
CLIP in Medical Imaging: A Comprehensive Survey [59.429714742927956]
Contrastive Language-Image Pre-training successfully introduces text supervision to vision models. It has shown promising results across various tasks, attributable to its generalizability and interpretability. Use of CLIP has recently gained increasing interest in the medical imaging domain.
arXiv Detail & Related papers (2023-12-12T15:21:57Z)
Diagnostic Reasoning Prompts Reveal the Potential for Large Language Model Interpretability in Medicine [4.773117448586697]
We develop novel diagnostic reasoning prompts to study whether large language models (LLMs) can perform clinical reasoning to accurately form a diagnosis. We find GPT4 can be prompted to mimic the common clinical reasoning processes of clinicians without sacrificing diagnostic accuracy.
arXiv Detail & Related papers (2023-08-13T19:04:07Z)
VBridge: Connecting the Dots Between Features, Explanations, and Data for Healthcare Models [85.4333256782337]
VBridge is a visual analytics tool that seamlessly incorporates machine learning explanations into clinicians' decision-making workflow. We identified three key challenges, including clinicians' unfamiliarity with ML features, lack of contextual information, and the need for cohort-level evidence. We demonstrated the effectiveness of VBridge through two case studies and expert interviews with four clinicians.
arXiv Detail & Related papers (2021-08-04T17:34:13Z)
Inheritance-guided Hierarchical Assignment for Clinical Automatic Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making. We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.