A Continued Pretrained LLM Approach for Automatic Medical Note Generation
- URL: http://arxiv.org/abs/2403.09057v3
- Date: Wed, 3 Apr 2024 18:08:30 GMT
- Title: A Continued Pretrained LLM Approach for Automatic Medical Note Generation
- Authors: Dong Yuan, Eti Rastogi, Gautam Naik, Sree Prasanna Rajagopal, Sagar Goyal, Fen Zhao, Bharath Chintagunta, Jeff Ward,
- Abstract summary: We introduce HEAL, the first continuously trained 13B LLaMA2-based LLM that is purpose-built for medical conversations and measured on automated scribing.
Our results demonstrate that HEAL outperforms GPT-4 and PMC-LLaMA in PubMedQA, with an accuracy of 78.4%.
Remarkably, HEAL surpasses GPT-4 and Med-PaLM 2 in identifying more correct medical concepts and exceeds the performance of human scribes in correctness and completeness.
- Score: 10.981182525560751
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: LLMs are revolutionizing NLP tasks. However, the use of the most advanced LLMs, such as GPT-4, is often prohibitively expensive for most specialized fields. We introduce HEAL, the first continuously trained 13B LLaMA2-based LLM that is purpose-built for medical conversations and measured on automated scribing. Our results demonstrate that HEAL outperforms GPT-4 and PMC-LLaMA in PubMedQA, with an accuracy of 78.4\%. It also achieves parity with GPT-4 in generating medical notes. Remarkably, HEAL surpasses GPT-4 and Med-PaLM 2 in identifying more correct medical concepts and exceeds the performance of human scribes and other comparable models in correctness and completeness.
Related papers
- OpenMedLM: Prompt engineering can out-perform fine-tuning in medical
question-answering with open-source large language models [4.556924372105915]
Open-source (OS) models represent a key area of growth for medical LLMs.
We present OpenMedLM, a prompting platform which delivers state-of-the-art (SOTA) performance for OS LLMs on medical benchmarks.
arXiv Detail & Related papers (2024-02-29T17:19:39Z) - MEDITRON-70B: Scaling Medical Pretraining for Large Language Models [91.25119823784705]
Large language models (LLMs) can potentially democratize access to medical knowledge.
We release MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain.
arXiv Detail & Related papers (2023-11-27T18:49:43Z) - HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs [62.73042700847977]
HuatuoGPT-II has shown state-of-the-art performance in Chinese medicine domain on a number of benchmarks.
It even outperforms proprietary models like ChatGPT and GPT-4 in some aspects, especially in Traditional Chinese Medicine.
arXiv Detail & Related papers (2023-11-16T10:56:24Z) - ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences [51.66185471742271]
We propose ChiMed-GPT, a benchmark LLM designed explicitly for Chinese medical domain.
ChiMed-GPT undergoes a comprehensive training regime with pre-training, SFT, and RLHF.
We analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients.
arXiv Detail & Related papers (2023-11-10T12:25:32Z) - A Survey of Large Language Models in Medicine: Progress, Application, and Challenge [85.09998659355038]
Large language models (LLMs) have received substantial attention due to their capabilities for understanding and generating human language.
This review aims to provide a detailed overview of the development and deployment of LLMs in medicine.
arXiv Detail & Related papers (2023-11-09T02:55:58Z) - Evaluating Large Language Models in Ophthalmology [34.13457684015814]
The performance of three different large language models (LLMS) in answering ophthalmology professional questions was evaluated.
GPT-4 showed significantly higher answer stability and confidence than GPT-3.5 and PaLM2.
arXiv Detail & Related papers (2023-11-07T16:19:45Z) - Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model [41.11769935795965]
We present a multi-stage training method combining Domain-specific Continued Pre-training (DCPT), Supervised Fine-tuning (SFT), and Direct Preference Optimization (DPO)
In the CPT and SFT phases, Qilin-Med achieved 38.4% and 40.0% accuracy on the CMExam test set, respectively.
In the DPO phase, it scored 16.66 in BLEU-1 and 27.44 in ROUGE-1 on the Huatuo-26M test set, bringing further improvement to the SFT phase (12.69 in BLEU-1 and 24.21 in ROUGE-1)
arXiv Detail & Related papers (2023-10-13T13:17:03Z) - Efficient Finetuning Large Language Models For Vietnamese Chatbot [1.2075778142867704]
Large language models (LLMs) have been shown to achieve remarkable performance across a variety of natural language tasks.
We leverage large-scale instruction-following datasets from open-source projects, namely Alpaca, GPT4All, and Chat-Doctor.
We utilize parameter-efficient tuning through Low-Rank Adaptation (LoRA) on two open LLMs, resulting four models: Bloomz-Chat, Bloomz-Doctor, GPTJ-Chat, GPTJ-Doctor.
arXiv Detail & Related papers (2023-09-09T00:11:53Z) - Augmenting Black-box LLMs with Medical Textbooks for Clinical Question
Answering [54.13933019557655]
We present a system called LLMs Augmented with Medical Textbooks (LLM-AMT)
LLM-AMT integrates authoritative medical textbooks into the LLMs' framework using plug-and-play modules.
We found that medical textbooks as a retrieval corpus is proven to be a more effective knowledge database than Wikipedia in the medical domain.
arXiv Detail & Related papers (2023-09-05T13:39:38Z) - MedAlign: A Clinician-Generated Dataset for Instruction Following with
Electronic Medical Records [60.35217378132709]
Large language models (LLMs) can follow natural language instructions with human-level fluency.
evaluating LLMs on realistic text generation tasks for healthcare remains challenging.
We introduce MedAlign, a benchmark dataset of 983 natural language instructions for EHR data.
arXiv Detail & Related papers (2023-08-27T12:24:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.