ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data
and Comprehensive Evaluation
- URL: http://arxiv.org/abs/2306.09968v1
- Date: Fri, 16 Jun 2023 16:56:32 GMT
- Title: ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data
and Comprehensive Evaluation
- Authors: Guangyu Wang, Guoxing Yang, Zongxin Du, Longjun Fan, Xiaohu Li
- Abstract summary: Large language models have exhibited exceptional performance on various Natural Language Processing (NLP) tasks.
Despite these advances, their effectiveness in medical applications is limited, due to challenges such as factual inaccuracies, reasoning abilities, and lack grounding in real-world experience.
We present ClinicalGPT, a language model explicitly designed and optimized for clinical scenarios.
- Score: 5.690250818139763
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models have exhibited exceptional performance on various
Natural Language Processing (NLP) tasks, leveraging techniques such as the
pre-training, and instruction fine-tuning. Despite these advances, their
effectiveness in medical applications is limited, due to challenges such as
factual inaccuracies, reasoning abilities, and lack grounding in real-world
experience. In this study, we present ClinicalGPT, a language model explicitly
designed and optimized for clinical scenarios. By incorporating extensive and
diverse real-world data, such as medical records, domain-specific knowledge,
and multi-round dialogue consultations in the training process, ClinicalGPT is
better prepared to handle multiple clinical task. Furthermore, we introduce a
comprehensive evaluation framework that includes medical knowledge
question-answering, medical exams, patient consultations, and diagnostic
analysis of medical records. Our results demonstrate that ClinicalGPT
significantly outperforms other models in these tasks, highlighting the
effectiveness of our approach in adapting large language models to the critical
domain of healthcare.
Related papers
- Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions.
VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information.
We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - Generative Large Language Models are autonomous practitioners of
evidence-based medicine [27.229179922424063]
Evidence-based medicine (EBM) is fundamental to modern clinical practice, requiring clinicians to continually update their knowledge and apply the best clinical evidence in patient care.
The practice of EBM faces challenges due to rapid advancements in medical research, leading to information overload for clinicians.
The integration of artificial intelligence (AI), specifically Generative Large Language Models (LLMs), offers a promising solution towards managing this complexity.
arXiv Detail & Related papers (2024-01-05T15:09:57Z) - Preserving the knowledge of long clinical texts using aggregated
ensembles of large language models [0.0]
Clinical texts contain rich and valuable information that can be used for various clinical outcome prediction tasks.
Applying large language models, such as BERT-based models, to clinical texts poses two major challenges.
This paper proposes a novel method to preserve the knowledge of long clinical texts using aggregated ensembles of large language models.
arXiv Detail & Related papers (2023-11-02T19:50:02Z) - Emulating Human Cognitive Processes for Expert-Level Medical
Question-Answering with Large Language Models [0.23463422965432823]
BooksMed is a novel framework based on a Large Language Model (LLM)
It emulates human cognitive processes to deliver evidence-based and reliable responses.
We present ExpertMedQA, a benchmark comprised of open-ended, expert-level clinical questions.
arXiv Detail & Related papers (2023-10-17T13:39:26Z) - Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain [13.912870728383396]
Adapting pretrained language models to novel domains, such as clinical applications, traditionally involves retraining their entire set of parameters.
We propose a two-step PEFT framework and evaluate it in the clinical domain.
arXiv Detail & Related papers (2023-07-06T15:06:41Z) - Almanac: Retrieval-Augmented Language Models for Clinical Medicine [1.5505279143287174]
We develop Almanac, a large language model framework augmented with retrieval capabilities for medical guideline and treatment recommendations.
Performance on a novel dataset of clinical scenarios evaluated by a panel of 5 board-certified and resident physicians demonstrates significant increases in factuality.
arXiv Detail & Related papers (2023-03-01T02:30:11Z) - Cross-Lingual Knowledge Transfer for Clinical Phenotyping [55.92262310716537]
We investigate cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language.
We evaluate these strategies for a Greek and a Spanish clinic leveraging clinical notes from different clinical domains.
Our results show that using multilingual data overall improves clinical phenotyping models and can compensate for data sparseness.
arXiv Detail & Related papers (2022-08-03T08:33:21Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.