Do We Still Need Clinical Language Models?
- URL: http://arxiv.org/abs/2302.08091v1
- Date: Thu, 16 Feb 2023 05:08:34 GMT
- Title: Do We Still Need Clinical Language Models?
- Authors: Eric Lehman, Evan Hernandez, Diwakar Mahajan, Jonas Wulff, Micah J.
Smith, Zachary Ziegler, Daniel Nadler, Peter Szolovits, Alistair Johnson,
Emily Alsentzer
- Abstract summary: We show that relatively small specialized clinical models substantially outperform all in-context learning approaches.
We release the code and the models used under the PhysioNet Credentialed Health Data license and data use agreement.
- Score: 15.023633270864675
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although recent advances in scaling large language models (LLMs) have
resulted in improvements on many NLP tasks, it remains unclear whether these
models trained primarily with general web text are the right tool in highly
specialized, safety critical domains such as clinical text. Recent results have
suggested that LLMs encode a surprising amount of medical knowledge. This
raises an important question regarding the utility of smaller domain-specific
language models. With the success of general-domain LLMs, is there still a need
for specialized clinical models? To investigate this question, we conduct an
extensive empirical analysis of 12 language models, ranging from 220M to 175B
parameters, measuring their performance on 3 different clinical tasks that test
their ability to parse and reason over electronic health records. As part of
our experiments, we train T5-Base and T5-Large models from scratch on clinical
notes from MIMIC III and IV to directly investigate the efficiency of clinical
tokens. We show that relatively small specialized clinical models substantially
outperform all in-context learning approaches, even when finetuned on limited
annotated data. Further, we find that pretraining on clinical tokens allows for
smaller, more parameter-efficient models that either match or outperform much
larger language models trained on general text. We release the code and the
models used under the PhysioNet Credentialed Health Data license and data use
agreement.
Related papers
- Towards Evaluating and Building Versatile Large Language Models for Medicine [57.49547766838095]
We present MedS-Bench, a benchmark designed to evaluate the performance of large language models (LLMs) in clinical contexts.
MedS-Bench spans 11 high-level clinical tasks, including clinical report summarization, treatment recommendations, diagnosis, named entity recognition, and medical concept explanation.
MedS-Ins comprises 58 medically oriented language corpora, totaling 13.5 million samples across 122 tasks.
arXiv Detail & Related papers (2024-08-22T17:01:34Z) - Is larger always better? Evaluating and prompting large language models for non-generative medical tasks [11.799956298563844]
This study benchmarks various models, including GPT-based LLMs, BERT-based models, and traditional clinical predictive models.
We focused on tasks such as readmission and prediction, disease hierarchy reconstruction, and biomedical sentence matching.
Results indicated that LLMs exhibited robust zero-shot predictive capabilities on structured EHR data when using well-designed prompting strategies.
For unstructured medical texts, LLMs did not outperform finetuned BERT models, which excelled in both supervised and unsupervised tasks.
arXiv Detail & Related papers (2024-07-26T06:09:10Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - SoftTiger: A Clinical Foundation Model for Healthcare Workflows [5.181665205189493]
We introduce SoftTiger, a clinical large language model (CLaM) designed as a foundation model for healthcare.
We collect and annotate data for three subtasks, namely, international patient summary, clinical impression and medical encounter.
We supervised fine-tuned a state-of-the-art LLM using public and credentialed clinical data.
arXiv Detail & Related papers (2024-03-01T04:39:16Z) - Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)
Our research aims to transform existing medication recommendation methodologies using LLMs.
To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL)
We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z) - Dynamic Q&A of Clinical Documents with Large Language Models [3.021316686584699]
This work introduces a natural language interface using large language models (LLMs) for dynamic question-answering on clinical notes.
Experiments, utilizing various embedding models and advanced LLMs, show Wizard Vicuna's superior accuracy, albeit with high compute demands.
arXiv Detail & Related papers (2024-01-19T14:50:22Z) - Generalist embedding models are better at short-context clinical
semantic search than specialized embedding models [0.9296448006507203]
We construct a dataset based on the ICD-10-CM code descriptions and their easily reproducible rephrasing.
We benchmarked existing embedding models, either generalist or specialized in the clinical domain, in a semantic search task.
Results showed that generalist models performed better than clinical models, suggesting that existing clinical specialized models are more sensitive to small changes in input that confuse them.
arXiv Detail & Related papers (2024-01-03T19:03:32Z) - Lightweight Transformers for Clinical Natural Language Processing [9.532776962985828]
This study focuses on development of compact language models for processing clinical texts.
We developed a number of efficient lightweight clinical transformers using knowledge distillation and continual learning.
Our evaluation was done across several standard datasets and covered a wide range of clinical text-mining tasks.
arXiv Detail & Related papers (2023-02-09T16:07:31Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.