HuatuoGPT, towards Taming Language Model to Be a Doctor
- URL: http://arxiv.org/abs/2305.15075v1
- Date: Wed, 24 May 2023 11:56:01 GMT
- Title: HuatuoGPT, towards Taming Language Model to Be a Doctor
- Authors: Hongbo Zhang and Junying Chen and Feng Jiang and Fei Yu and Zhihong
Chen and Jianquan Li and Guiming Chen and Xiangbo Wu and Zhiyi Zhang and
Qingying Xiao and Xiang Wan and Benyou Wang and Haizhou Li
- Abstract summary: HuatuoGPT is a large language model (LLM) for medical consultation.
We leverage both textitdistilled data from ChatGPT and textitreal-world data from doctors in the supervised fine-tuned stage.
- Score: 67.96794664218318
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present HuatuoGPT, a large language model (LLM) for medical
consultation. The core recipe of HuatuoGPT is to leverage both
\textit{distilled data from ChatGPT} and \textit{real-world data from doctors}
in the supervised fine-tuned stage. The responses of ChatGPT are usually
detailed, well-presented and informative while it cannot perform like a doctor
in many aspects, e.g. for integrative diagnosis. We argue that real-world data
from doctors would be complementary to distilled data in the sense the former
could tame a distilled language model to perform like doctors. To better
leverage the strengths of both data, we train a reward model to align the
language model with the merits that both data bring, following an RLAIF
(reinforced learning from AI feedback) fashion. To evaluate and benchmark the
models, we propose a comprehensive evaluation scheme (including automatic and
manual metrics). Experimental results demonstrate that HuatuoGPT achieves
state-of-the-art results in performing medical consultation among open-source
LLMs in GPT-4 evaluation, human evaluation, and medical benchmark datasets. It
is worth noting that by using additional real-world data and RLAIF, the
distilled language model (i.e., HuatuoGPT) outperforms its teacher model
ChatGPT in most cases. Our code, data, and models are publicly available at
\url{https://github.com/FreedomIntelligence/HuatuoGPT}. The online demo is
available at \url{https://www.HuatuoGPT.cn/}.
Related papers
- LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model [55.80651780294357]
State-of-the-art medical multi-modal large language models (med-MLLM) leverage instruction-following data in pre-training.
LoGra-Med is a new multi-graph alignment algorithm that enforces triplet correlations across image modalities, conversation-based descriptions, and extended captions.
Our results show LoGra-Med matches LLAVA-Med performance on 600K image-text pairs for Medical VQA and significantly outperforms it when trained on 10% of the data.
arXiv Detail & Related papers (2024-10-03T15:52:03Z) - Adapting LLMs for the Medical Domain in Portuguese: A Study on Fine-Tuning and Model Evaluation [1.922611370494431]
This study evaluates the performance of large language models (LLMs) as medical agents in Portuguese.
The InternLM2 model, with initial training on medical data, presented the best overall performance.
DrBode models, derived from ChatBode, exhibited a phenomenon of catastrophic forgetting of acquired medical knowledge.
arXiv Detail & Related papers (2024-09-30T19:10:03Z) - Efficient Medical Question Answering with Knowledge-Augmented Question Generation [5.145812785735094]
We introduce a method to improve the proficiency of a small language model in the medical domain by employing a two-fold approach.
We first fine-tune the model on a corpus of medical textbooks.
Then, we use GPT-4 to generate questions similar to the downstream task, prompted with textbook knowledge, and use them to fine-tune the model.
arXiv Detail & Related papers (2024-05-23T14:53:52Z) - PeFoMed: Parameter Efficient Fine-tuning of Multimodal Large Language Models for Medical Imaging [8.043625583479598]
Multimodal large language models (MLLMs) represent an evolutionary expansion in the capabilities of traditional large language models.
Recent works investigate the adaptation of MLLMs as a universal solution to address medical multi-modal problems as a generative task.
We propose a parameter efficient framework for fine-tuning MLLMs, specifically validated on medical visual question answering (Med-VQA) and medical report generation (MRG) tasks.
arXiv Detail & Related papers (2024-01-05T13:22:12Z) - Can ChatGPT be Your Personal Medical Assistant? [0.09264362806173355]
This study uses publicly available online questions and answering datasets in Arabic language.
There are almost 430K questions and answers for 20 disease-specific categories.
The performance of this fine-tuned model was evaluated through automated and human evaluation.
arXiv Detail & Related papers (2023-12-19T09:54:27Z) - Interpretable Medical Diagnostics with Structured Data Extraction by
Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports.
We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM.
We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z) - The Curse of Recursion: Training on Generated Data Makes Models Forget [70.02793975243212]
Large language models (LLMs) are here to stay, and will bring about drastic change in the whole ecosystem of online text and images.
We find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear.
arXiv Detail & Related papers (2023-05-27T15:10:41Z) - Catch Me If You Can: Identifying Fraudulent Physician Reviews with Large
Language Models Using Generative Pre-Trained Transformers [1.0499611180329804]
The proliferation of fake reviews of doctors has potentially detrimental consequences for patient well-being.
This study utilizes a novel pre-labeled dataset of 38048 physician reviews to establish the effectiveness of large language models in classifying reviews.
arXiv Detail & Related papers (2023-04-19T19:59:26Z) - Shall We Pretrain Autoregressive Language Models with Retrieval? A
Comprehensive Study [115.96080028033904]
We study a scalable pre-trained retrieval-augmented LM (i.e., RETRO) compared with standard GPT and retrieval-augmented GPT.
Our findings highlight the promising direction of pretraining autoregressive LMs with retrieval as future foundation models.
arXiv Detail & Related papers (2023-04-13T18:04:19Z) - Scientific Language Models for Biomedical Knowledge Base Completion: An
Empirical Study [62.376800537374024]
We study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction.
We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance.
arXiv Detail & Related papers (2021-06-17T17:55:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.