Qibo: A Large Language Model for Traditional Chinese Medicine
- URL: http://arxiv.org/abs/2403.16056v3
- Date: Sat, 22 Jun 2024 05:43:53 GMT
- Title: Qibo: A Large Language Model for Traditional Chinese Medicine
- Authors: Heyi Zhang, Xin Wang, Zhaopeng Meng, Zhe Chen, Pengwei Zhuang, Yongzhe Jia, Dawei Xu, Wenbin Guo,
- Abstract summary: In traditional Chinese medicine, there are challenges such as the essential differences between theory and modern medicine.
We propose a two-stage training approach that combines continuous pre-training and supervised fine-tuning.
A notable contribution of our study is the processing of a 2GB corpus dedicated to TCM.
- Score: 10.394665777883064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) has made significant progress in a number of professional fields, including medicine, law, and finance. However, in traditional Chinese medicine (TCM), there are challenges such as the essential differences between theory and modern medicine, the lack of specialized corpus resources, and the fact that relying only on supervised fine-tuning may lead to overconfident predictions. To address these challenges, we propose a two-stage training approach that combines continuous pre-training and supervised fine-tuning. A notable contribution of our study is the processing of a 2GB corpus dedicated to TCM, constructing pre-training and instruction fine-tuning datasets for TCM, respectively. In addition, we have developed Qibo-Benchmark, a tool that evaluates the performance of LLM in the TCM on multiple dimensions, including subjective, objective, and three TCM NLP tasks. The medical LLM trained with our pipeline, named $\textbf{Qibo}$, exhibits significant performance boosts. Compared to the baselines, the average subjective win rate is 63%, the average objective accuracy improved by 23% to 58%, and the Rouge-L scores for the three TCM NLP tasks are 0.72, 0.61, and 0.55. Finally, we propose a pipline to apply Qibo to TCM consultation and demonstrate the model performance through the case study.
Related papers
- Training Compute-Optimal Protein Language Models [48.79416103951816]
Most protein language models are trained with extensive compute resources until performance gains plateau.
Our investigation is grounded in a massive dataset consisting of 939 million protein sequences.
We trained over 300 models ranging from 3.5 million to 10.7 billion parameters on 5 to 200 billion unique tokens.
arXiv Detail & Related papers (2024-11-04T14:58:37Z) - Enhancing the Traditional Chinese Medicine Capabilities of Large Language Model through Reinforcement Learning from AI Feedback [5.855520522078306]
We propose a framework to improve the performance of large language models for Traditional Chinese Medicine (TCM) tasks using only a small amount of data.
We use medical case data for supervised fine-tuning of the large model, making it initially capable of performing TCM tasks.
We further optimize the model's performance using reinforcement learning from AI feedback (RLAIF) to align it with the preference data.
arXiv Detail & Related papers (2024-11-01T04:19:55Z) - LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model [55.80651780294357]
State-of-the-art medical multi-modal large language models (med-MLLM) leverage instruction-following data in pre-training.
LoGra-Med is a new multi-graph alignment algorithm that enforces triplet correlations across image modalities, conversation-based descriptions, and extended captions.
Our results show LoGra-Med matches LLAVA-Med performance on 600K image-text pairs for Medical VQA and significantly outperforms it when trained on 10% of the data.
arXiv Detail & Related papers (2024-10-03T15:52:03Z) - Adapting LLMs for the Medical Domain in Portuguese: A Study on Fine-Tuning and Model Evaluation [1.922611370494431]
This study evaluates the performance of large language models (LLMs) as medical agents in Portuguese.
The InternLM2 model, with initial training on medical data, presented the best overall performance.
DrBode models, derived from ChatBode, exhibited a phenomenon of catastrophic forgetting of acquired medical knowledge.
arXiv Detail & Related papers (2024-09-30T19:10:03Z) - Towards Evaluating and Building Versatile Large Language Models for Medicine [57.49547766838095]
We present MedS-Bench, a benchmark designed to evaluate the performance of large language models (LLMs) in clinical contexts.
MedS-Bench spans 11 high-level clinical tasks, including clinical report summarization, treatment recommendations, diagnosis, named entity recognition, and medical concept explanation.
MedS-Ins comprises 58 medically oriented language corpora, totaling 13.5 million samples across 122 tasks.
arXiv Detail & Related papers (2024-08-22T17:01:34Z) - Efficient Continual Pre-training by Mitigating the Stability Gap [68.49269649759005]
We study the behavior of Large Language Models (LLMs) during continual pre-training.
We propose three effective strategies to enhance LLM performance within a fixed compute budget.
Our strategies improve the average medical task performance of the OpenLlama-3B model from 36.2% to 40.7% with only 40% of the original training budget.
arXiv Detail & Related papers (2024-06-21T02:28:37Z) - Exploring the Comprehension of ChatGPT in Traditional Chinese Medicine Knowledge [0.0]
We present a TCM question dataset named TCM-QA, which comprises three question types: single choice, multiple choice, and true or false.
In our study, we evaluate two settings of the LLM, zero-shot and few-shot settings, while concurrently discussing the differences between English and Chinese prompts.
Our results indicate that ChatGPT performs best in true or false questions, achieving the highest precision of 0.688 while scoring the lowest precision is 0.241 in multiple-choice questions.
arXiv Detail & Related papers (2024-03-14T08:20:40Z) - ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences [51.66185471742271]
We propose ChiMed-GPT, a benchmark LLM designed explicitly for Chinese medical domain.
ChiMed-GPT undergoes a comprehensive training regime with pre-training, SFT, and RLHF.
We analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients.
arXiv Detail & Related papers (2023-11-10T12:25:32Z) - TCM-GPT: Efficient Pre-training of Large Language Models for Domain
Adaptation in Traditional Chinese Medicine [11.537289359051975]
We propose a novel TCMDA (TCM Domain Adaptation) approach, efficient pre-training with domain-specific corpus.
Specifically, we first construct a large TCM-specific corpus, TCM-Corpus-1B, by identifying domain keywords and retreving from general corpus.
Then, our TCMDA leverages the LoRA which freezes the pretrained model's weights and uses rank decomposition matrices to efficiently train specific dense layers for pre-training and fine-tuning.
arXiv Detail & Related papers (2023-11-03T08:54:50Z) - Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model [41.11769935795965]
We present a multi-stage training method combining Domain-specific Continued Pre-training (DCPT), Supervised Fine-tuning (SFT), and Direct Preference Optimization (DPO)
In the CPT and SFT phases, Qilin-Med achieved 38.4% and 40.0% accuracy on the CMExam test set, respectively.
In the DPO phase, it scored 16.66 in BLEU-1 and 27.44 in ROUGE-1 on the Huatuo-26M test set, bringing further improvement to the SFT phase (12.69 in BLEU-1 and 24.21 in ROUGE-1)
arXiv Detail & Related papers (2023-10-13T13:17:03Z) - Fast Uncertainty Quantification for Deep Object Pose Estimation [91.09217713805337]
Deep learning-based object pose estimators are often unreliable and overconfident.
In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation.
arXiv Detail & Related papers (2020-11-16T06:51:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.