Few shot chain-of-thought driven reasoning to prompt LLMs for open ended
medical question answering
- URL: http://arxiv.org/abs/2403.04890v1
- Date: Thu, 7 Mar 2024 20:48:40 GMT
- Title: Few shot chain-of-thought driven reasoning to prompt LLMs for open ended
medical question answering
- Authors: Ojas Gramopadhye, Saeel Sandeep Nachane, Prateek Chanda, Ganesh
Ramakrishnan, Kshitij Sharad Jadhav, Yatin Nandwani, Dinesh Raghu, Sachindra
Joshi
- Abstract summary: We propose a modified version of the MedQA-USMLE dataset, which is subjective, to mimic real-life clinical scenarios.
We develop better in-contrast learning strategies by modifying the 5-shot-codex-CoT-prompt from arXiv:2207.08143 for the subjective MedQA dataset and developing our incremental-reasoning prompt.
- Score: 25.163347677278182
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language models (LLMs) have demonstrated significant potential in
transforming healthcare by automating tasks such as clinical documentation,
information retrieval, and decision support. In this aspect, carefully
engineered prompts have emerged as a powerful tool for using LLMs for medical
scenarios, e.g., patient clinical scenarios. In this paper, we propose a
modified version of the MedQA-USMLE dataset, which is subjective, to mimic
real-life clinical scenarios. We explore the Chain of Thought (CoT) reasoning
based on subjective response generation for the modified MedQA-USMLE dataset
with appropriate LM-driven forward reasoning for correct responses to the
medical questions. Keeping in mind the importance of response verification in
the medical setting, we utilize a reward training mechanism whereby the
language model also provides an appropriate verified response for a particular
response to a clinical question. In this regard, we also include
human-in-the-loop for different evaluation aspects. We develop better
in-contrast learning strategies by modifying the 5-shot-codex-CoT-prompt from
arXiv:2207.08143 for the subjective MedQA dataset and developing our
incremental-reasoning prompt. Our evaluations show that the incremental
reasoning prompt performs better than the modified codex prompt in certain
scenarios. We also show that greedy decoding with the incremental reasoning
method performs better than other strategies, such as prompt chaining and
eliminative reasoning.
Related papers
- MEDIQ: Question-Asking LLMs for Adaptive and Reliable Clinical Reasoning [36.400896909161006]
In high-stakes domains like clinical reasoning, AI assistants powered by large language models (LLMs) are yet to be reliable and safe.
We propose to develop more careful LLMs that ask follow-up questions to gather necessary and sufficient information and respond reliably.
We introduce MEDIQ, a framework to simulate realistic clinical interactions.
arXiv Detail & Related papers (2024-06-03T01:32:52Z) - Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions.
VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information.
We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z) - OLAPH: Improving Factuality in Biomedical Long-form Question Answering [15.585833125854418]
We introduce MedLFQA, a benchmark dataset reconstructed using long-form question-answering datasets related to the biomedical domain.
We also propose OLAPH, a simple and novel framework that enables the improvement of factuality through automatic evaluations.
Our findings reveal that a 7B LLM trained with our OLAPH framework can provide long answers comparable to the medical experts' answers in terms of factuality.
arXiv Detail & Related papers (2024-05-21T11:50:16Z) - Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models [10.04914417538886]
Large-scale language models (LLMs) have achieved remarkable success across various language tasks but suffer from hallucinations and temporal misalignment.
We propose a new textitDistill-Retrieve-Read framework instead of the previous textitRetrieve-then-Read.
arXiv Detail & Related papers (2024-04-27T13:11:42Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)
Our research aims to transform existing medication recommendation methodologies using LLMs.
To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - Large Language Models Encode Clinical Knowledge [21.630872464930587]
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation.
We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias.
We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning.
arXiv Detail & Related papers (2022-12-26T14:28:24Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - Semi-Supervised Variational Reasoning for Medical Dialogue Generation [70.838542865384]
Two key characteristics are relevant for medical dialogue generation: patient states and physician actions.
We propose an end-to-end variational reasoning approach to medical dialogue generation.
A physician policy network composed of an action-classifier and two reasoning detectors is proposed for augmented reasoning ability.
arXiv Detail & Related papers (2021-05-13T04:14:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.