Evaluation of Language Models in the Medical Context Under Resource-Constrained Settings
- URL: http://arxiv.org/abs/2406.16611v1
- Date: Mon, 24 Jun 2024 12:52:02 GMT
- Title: Evaluation of Language Models in the Medical Context Under Resource-Constrained Settings
- Authors: Andrea Posada, Daniel Rueckert, Felix Meissen, Philip Müller,
- Abstract summary: We provide a comprehensive survey of language models in the medical domain.
Our subset encompasses 53 models, ranging from 110 million to 13 billion parameters.
Our findings reveal remarkable performance across various tasks and datasets.
- Score: 10.39989311209284
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Since the emergence of the Transformer architecture, language model development has increased, driven by their promising potential. However, releasing these models into production requires properly understanding their behavior, particularly in sensitive domains such as medicine. Despite this need, the medical literature still lacks technical assessments of pre-trained language models, which are especially valuable in resource-constrained settings in terms of computational power or limited budget. To address this gap, we provide a comprehensive survey of language models in the medical domain. In addition, we selected a subset of these models for thorough evaluation, focusing on classification and text generation tasks. Our subset encompasses 53 models, ranging from 110 million to 13 billion parameters, spanning the three families of Transformer-based models and from diverse knowledge domains. This study employs a series of approaches for text classification together with zero-shot prompting instead of model training or fine-tuning, which closely resembles the limited resource setting in which many users of language models find themselves. Encouragingly, our findings reveal remarkable performance across various tasks and datasets, underscoring the latent potential of certain models to contain medical knowledge, even without domain specialization. Consequently, our study advocates for further exploration of model applications in medical contexts, particularly in resource-constrained settings. The code is available on https://github.com/anpoc/Language-models-in-medicine.
Related papers
- Evaluating Vision Language Model Adaptations for Radiology Report Generation in Low-Resource Languages [1.3699492682906507]
Language-specific models substantially outperformed both general and domain-specific models in generating radiology reports.<n>Models fine-tuned with medical terminology exhibited enhanced performance across all languages.
arXiv Detail & Related papers (2025-05-02T08:14:03Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - Clinical information extraction for Low-resource languages with Few-shot learning using Pre-trained language models and Prompting [12.166472806042592]
Automatic extraction of medical information from clinical documents poses several challenges.
Recent advances in domain-adaptation and prompting methods showed promising results with minimal training data.
We demonstrate that a lightweight, domain-adapted pretrained model, prompted with just 20 shots, outperforms a traditional classification model by 30.5% accuracy.
arXiv Detail & Related papers (2024-03-20T08:01:33Z) - MedEval: A Multi-Level, Multi-Task, and Multi-Domain Medical Benchmark
for Language Model Evaluation [22.986061896641083]
MedEval is a multi-level, multi-task, and multi-domain medical benchmark to facilitate the development of language models for healthcare.
With 22,779 collected sentences and 21,228 reports, we provide expert annotations at multiple levels, offering a granular potential usage of the data.
arXiv Detail & Related papers (2023-10-21T18:59:41Z) - L2CEval: Evaluating Language-to-Code Generation Capabilities of Large
Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs)
We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods.
In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z) - An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT [80.33783969507458]
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians.
Recent studies have achieved promising results in automatic impression generation using large-scale medical text data.
These models often require substantial amounts of medical text data and have poor generalization performance.
arXiv Detail & Related papers (2023-04-17T17:13:42Z) - Localising In-Domain Adaptation of Transformer-Based Biomedical Language
Models [0.987336898133886]
We present two approaches to derive biomedical language models in languages other than English.
One is based on neural machine translation of English resources, favoring quantity over quality.
The other is based on a high-grade, narrow-scoped corpus written in Italian, thus preferring quality over quantity.
arXiv Detail & Related papers (2022-12-20T16:59:56Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - CBAG: Conditional Biomedical Abstract Generation [1.2633386045916442]
We propose a transformer-based conditional language model with a shallow encoder "condition" stack, and a deep "language model" stack of multi-headed attention blocks.
We generate biomedical abstracts given only a proposed title, an intended publication year, and a set of keywords.
arXiv Detail & Related papers (2020-02-13T17:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.