A Comprehensive Evaluation of Large Language Models on Benchmark
Biomedical Text Processing Tasks
- URL: http://arxiv.org/abs/2310.04270v3
- Date: Mon, 19 Feb 2024 22:58:39 GMT
- Title: A Comprehensive Evaluation of Large Language Models on Benchmark
Biomedical Text Processing Tasks
- Authors: Israt Jahan, Md Tahmid Rahman Laskar, Chun Peng, Jimmy Huang
- Abstract summary: This paper aims to evaluate the performance of Large Language Models (LLM) on benchmark biomedical tasks.
To the best of our knowledge, this is the first work that conducts an extensive evaluation and comparison of various LLMs in the biomedical domain.
- Score: 2.5027382653219155
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recently, Large Language Models (LLM) have demonstrated impressive capability
to solve a wide range of tasks. However, despite their success across various
tasks, no prior work has investigated their capability in the biomedical domain
yet. To this end, this paper aims to evaluate the performance of LLMs on
benchmark biomedical tasks. For this purpose, we conduct a comprehensive
evaluation of 4 popular LLMs in 6 diverse biomedical tasks across 26 datasets.
To the best of our knowledge, this is the first work that conducts an extensive
evaluation and comparison of various LLMs in the biomedical domain.
Interestingly, we find based on our evaluation that in biomedical datasets that
have smaller training sets, zero-shot LLMs even outperform the current
state-of-the-art fine-tuned biomedical models. This suggests that pretraining
on large text corpora makes LLMs quite specialized even in the biomedical
domain. We also find that not a single LLM can outperform other LLMs in all
tasks, with the performance of different LLMs may vary depending on the task.
While their performance is still quite poor in comparison to the biomedical
models that were fine-tuned on large training sets, our findings demonstrate
that LLMs have the potential to be a valuable tool for various biomedical tasks
that lack large annotated data.
Related papers
- Evaluating Large Language Models for Public Health Classification and Extraction Tasks [0.3593941384437792]
We present evaluations of Large Language Models (LLMs) for public health tasks involving the classification and extraction of free text.
We initially evaluate five open-weight LLMs across all tasks using zero-shot in-context learning.
We find promising signs that LLMs may be useful tools for public health experts to extract information from a wide variety of free text sources.
arXiv Detail & Related papers (2024-05-23T16:33:18Z) - An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks.
These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems.
Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z) - BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains [8.448541067852]
Large Language Models (LLMs) have demonstrated remarkable versatility in recent years.
Despite the availability of various open-source LLMs tailored for health contexts, adapting general-purpose LLMs to the medical domain presents significant challenges.
We introduce BioMistral, an open-source LLM tailored for the biomedical domain, utilizing Mistral as its foundation model.
arXiv Detail & Related papers (2024-02-15T23:39:04Z) - Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.
LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z) - Taiyi: A Bilingual Fine-Tuned Large Language Model for Diverse
Biomedical Tasks [19.091278630792615]
Most existing biomedical large language models (LLMs) focus on enhancing performance in monolingual biomedical question answering and conversation tasks.
We present Taiyi, a bilingual fine-tuned LLM for diverse biomedical tasks.
arXiv Detail & Related papers (2023-11-20T08:51:30Z) - A Survey of Large Language Models in Medicine: Progress, Application, and Challenge [85.09998659355038]
Large language models (LLMs) have received substantial attention due to their capabilities for understanding and generating human language.
This review aims to provide a detailed overview of the development and deployment of LLMs in medicine.
arXiv Detail & Related papers (2023-11-09T02:55:58Z) - Large Language Models Illuminate a Progressive Pathway to Artificial
Healthcare Assistant: A Review [16.008511195589925]
Large language models (LLMs) have shown promising capabilities in mimicking human-level language comprehension and reasoning.
This paper provides a comprehensive review on the applications and implications of LLMs in medicine.
arXiv Detail & Related papers (2023-11-03T13:51:36Z) - BLESS: Benchmarking Large Language Models on Sentence Simplification [55.461555829492866]
We present BLESS, a performance benchmark of the most recent state-of-the-art large language models (LLMs) on the task of text simplification (TS)
We assess a total of 44 models, differing in size, architecture, pre-training methods, and accessibility, on three test sets from different domains (Wikipedia, news, and medical) under a few-shot setting.
Our evaluation indicates that the best LLMs, despite not being trained on TS, perform comparably with state-of-the-art TS baselines.
arXiv Detail & Related papers (2023-10-24T12:18:17Z) - Augmenting Black-box LLMs with Medical Textbooks for Clinical Question
Answering [54.13933019557655]
We present a system called LLMs Augmented with Medical Textbooks (LLM-AMT)
LLM-AMT integrates authoritative medical textbooks into the LLMs' framework using plug-and-play modules.
We found that medical textbooks as a retrieval corpus is proven to be a more effective knowledge database than Wikipedia in the medical domain.
arXiv Detail & Related papers (2023-09-05T13:39:38Z) - A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks [7.542019351929903]
We evaluate four state-of-the-art instruction-tuned large language models (LLMs)
On a set of 13 real-world clinical and biomedical natural language processing (NLP) tasks in English.
arXiv Detail & Related papers (2023-07-22T15:58:17Z) - Sentiment Analysis in the Era of Large Language Models: A Reality Check [69.97942065617664]
This paper investigates the capabilities of large language models (LLMs) in performing various sentiment analysis tasks.
We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets.
arXiv Detail & Related papers (2023-05-24T10:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.