Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language
Model through Expert Feedback and Real-world Multi-turn Dialogue
- URL: http://arxiv.org/abs/2308.03549v3
- Date: Thu, 28 Dec 2023 15:20:24 GMT
- Title: Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language
Model through Expert Feedback and Real-world Multi-turn Dialogue
- Authors: Songhua Yang, Hanjie Zhao, Senbin Zhu, Guangyu Zhou, Hongfei Xu,
Yuxiang Jia, Hongying Zan
- Abstract summary: We introduce Zhongjing, the first Chinese medical Large Language Models (LLMs) that implements an entire training pipeline from continuous pre-training, SFT, to Reinforcement Learning from Human Feedback (RLHF)
We construct a Chinese multi-turn medical dialogue dataset of 70,000 authentic doctor-patient dialogues, CMtMedQA, which significantly enhances the model's capability for complex dialogue and proactive inquiry initiation.
- Score: 4.558040877516838
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in Large Language Models (LLMs) have achieved remarkable
breakthroughs in understanding and responding to user intents. However, their
performance lag behind general use cases in some expertise domains, such as
Chinese medicine. Existing efforts to incorporate Chinese medicine into LLMs
rely on Supervised Fine-Tuning (SFT) with single-turn and distilled dialogue
data. These models lack the ability for doctor-like proactive inquiry and
multi-turn comprehension and cannot align responses with experts' intentions.
In this work, we introduce Zhongjing, the first Chinese medical LLaMA-based LLM
that implements an entire training pipeline from continuous pre-training, SFT,
to Reinforcement Learning from Human Feedback (RLHF). Additionally, we
construct a Chinese multi-turn medical dialogue dataset of 70,000 authentic
doctor-patient dialogues, CMtMedQA, which significantly enhances the model's
capability for complex dialogue and proactive inquiry initiation. We also
define a refined annotation rule and evaluation criteria given the unique
characteristics of the biomedical domain. Extensive experimental results show
that Zhongjing outperforms baselines in various capacities and matches the
performance of ChatGPT in some abilities, despite the 100x parameters. Ablation
studies also demonstrate the contributions of each component: pre-training
enhances medical knowledge, and RLHF further improves instruction-following
ability and safety. Our code, datasets, and models are available at
https://github.com/SupritYoung/Zhongjing.
Related papers
- RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment [54.91736546490813]
We introduce the RuleAlign framework, designed to align Large Language Models with specific diagnostic rules.
We develop a medical dialogue dataset comprising rule-based communications between patients and physicians.
Experimental results demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-08-22T17:44:40Z) - Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions.
VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information.
We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences [51.66185471742271]
We propose ChiMed-GPT, a benchmark LLM designed explicitly for Chinese medical domain.
ChiMed-GPT undergoes a comprehensive training regime with pre-training, SFT, and RLHF.
We analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients.
arXiv Detail & Related papers (2023-11-10T12:25:32Z) - Large Language Models Illuminate a Progressive Pathway to Artificial
Healthcare Assistant: A Review [16.008511195589925]
Large language models (LLMs) have shown promising capabilities in mimicking human-level language comprehension and reasoning.
This paper provides a comprehensive review on the applications and implications of LLMs in medicine.
arXiv Detail & Related papers (2023-11-03T13:51:36Z) - PromptCBLUE: A Chinese Prompt Tuning Benchmark for the Medical Domain [24.411904114158673]
We re-build the Chinese Biomedical Language Understanding Evaluation (CBlue) benchmark into a large scale prompt-tuning benchmark, PromptCBlue.
Our benchmark is a suitable test-bed and an online platform for evaluating Chinese LLMs' multi-task capabilities on a wide range bio-medical tasks.
arXiv Detail & Related papers (2023-10-22T02:20:38Z) - PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding.
LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge.
We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z) - Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine [68.7814360102644]
We propose the Re$3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning.
We demonstrate the effectiveness of our method in generating patient discharge instructions.
arXiv Detail & Related papers (2022-10-23T16:34:39Z) - Knowledge-Empowered Representation Learning for Chinese Medical Reading
Comprehension: Task, Model and Resources [36.960318276653986]
We introduce a multi-target MRC task for the medical domain, whose goal is to predict answers to medical questions and the corresponding support sentences simultaneously.
We propose the Chinese medical BERT model for the task (CMedBERT), which fuses medical knowledge into pre-trained language models.
Experiments show that CMedBERT consistently outperforms strong baselines by fusing context-aware and knowledge-aware token representations.
arXiv Detail & Related papers (2020-08-24T11:23:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.