Dialogue is Better Than Monologue: Instructing Medical LLMs via Strategical Conversations
- URL: http://arxiv.org/abs/2501.17860v1
- Date: Wed, 29 Jan 2025 18:58:48 GMT
- Title: Dialogue is Better Than Monologue: Instructing Medical LLMs via Strategical Conversations
- Authors: Zijie Liu, Xinyu Zhao, Jie Peng, Zhuangdi Zhu, Qingyu Chen, Xia Hu, Tianlong Chen,
- Abstract summary: We introduce a novel benchmark that simulates real-world diagnostic scenarios, integrating noise and difficulty levels aligned with USMLE standards.
We also explore dialogue-based fine-tuning, which transforms static datasets into conversational formats to better capture iterative reasoning processes.
Experiments show that dialogue-tuned models outperform traditional methods, with improvements of $9.64%$ in multi-round reasoning scenarios and $6.18%$ in accuracy in a noisy environment.
- Score: 74.83732294523402
- License:
- Abstract: Current medical AI systems often fail to replicate real-world clinical reasoning, as they are predominantly trained and evaluated on static text and question-answer tasks. These tuning methods and benchmarks overlook critical aspects like evidence-based reasoning and handling distracting information. To bridge this gap, we introduce a novel benchmark that simulates real-world diagnostic scenarios, integrating noise and difficulty levels aligned with USMLE standards. Moreover, we explore dialogue-based fine-tuning, which transforms static datasets into conversational formats to better capture iterative reasoning processes. Experiments show that dialogue-tuned models outperform traditional methods, with improvements of $9.64\%$ in multi-round reasoning scenarios and $6.18\%$ in accuracy in a noisy environment. Our findings highlight dialogue tuning as a promising approach for advancing clinically aligned and robust medical AI systems.
Related papers
- MORTAR: Metamorphic Multi-turn Testing for LLM-based Dialogue Systems [7.7097144952707435]
We propose MORTAR, a MetamORphic multi-TuRn diAlogue testing appRoach.
MorTAR automates the generation of follow-up question-answer (QA) dialogue test cases.
It detects bugs of multi-turn dialogue systems in a low-cost manner.
arXiv Detail & Related papers (2024-12-20T04:31:03Z) - MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues [41.23757609484281]
Speech recognition errors can significantly degrade the performance of downstream tasks like summarization.
We propose MEDSAGE, an approach for generating synthetic samples for data augmentation using Large Language Models.
LLMs can effectively model ASR noise, and incorporating this noisy data into the training process significantly improves the robustness and accuracy of medical dialogue summarization systems.
arXiv Detail & Related papers (2024-08-26T17:04:00Z) - Synthetic Patient-Physician Dialogue Generation from Clinical Notes Using LLM [27.33193944412666]
Medical dialogue systems (MDS) enhance patient-physician communication, improve healthcare accessibility, and reduce costs.
However, acquiring suitable data to train these systems poses significant challenges.
Our approach, SynDial, uses a single LLM iteratively with zero-shot prompting and a feedback loop to generate high-quality synthetic dialogues.
arXiv Detail & Related papers (2024-08-12T16:49:22Z) - Reasoning Like a Doctor: Improving Medical Dialogue Systems via Diagnostic Reasoning Process Alignment [14.563188427409958]
Our work aims to build a medical dialogue system that aligns with clinicians' diagnostic reasoning processes.
We propose a novel framework, Emulation, designed to generate an appropriate response.
Our framework furnishes clear explanations for the generated responses, enhancing its transparency in medical consultations.
arXiv Detail & Related papers (2024-06-20T02:02:53Z) - Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions.
VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information.
We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - Integrating Physician Diagnostic Logic into Large Language Models: Preference Learning from Process Feedback [19.564416963801268]
We propose an approach called preference learning from process feedback.
PLPF integrates the doctor's diagnostic logic into LLMs.
We show that PLPF enhances the diagnostic accuracy of the baseline model in medical conversations by 17.6%.
arXiv Detail & Related papers (2024-01-11T06:42:45Z) - Show from Tell: Audio-Visual Modelling in Clinical Settings [58.88175583465277]
We consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations without human expert annotation.
A simple yet effective multi-modal self-supervised learning framework is proposed for this purpose.
The proposed approach is able to localise anatomical regions of interest during ultrasound imaging, with only speech audio as a reference.
arXiv Detail & Related papers (2023-10-25T08:55:48Z) - Semi-Supervised Variational Reasoning for Medical Dialogue Generation [70.838542865384]
Two key characteristics are relevant for medical dialogue generation: patient states and physician actions.
We propose an end-to-end variational reasoning approach to medical dialogue generation.
A physician policy network composed of an action-classifier and two reasoning detectors is proposed for augmented reasoning ability.
arXiv Detail & Related papers (2021-05-13T04:14:35Z) - Rethinking Dialogue State Tracking with Reasoning [76.0991910623001]
This paper proposes to track dialogue states gradually with reasoning over dialogue turns with the help of the back-end data.
Empirical results demonstrate that our method significantly outperforms the state-of-the-art methods by 38.6% in terms of joint belief accuracy for MultiWOZ 2.1.
arXiv Detail & Related papers (2020-05-27T02:05:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.