Related papers: A Comprehensive Evaluation of Large Language Models on Benchmark Biomedical Text Processing Tasks

A Comprehensive Evaluation of Large Language Models on Benchmark Biomedical Text Processing Tasks

URL: http://arxiv.org/abs/2310.04270v3
Date: Mon, 19 Feb 2024 22:58:39 GMT
Title: A Comprehensive Evaluation of Large Language Models on Benchmark Biomedical Text Processing Tasks
Authors: Israt Jahan, Md Tahmid Rahman Laskar, Chun Peng, Jimmy Huang
Abstract summary: This paper aims to evaluate the performance of Large Language Models (LLM) on benchmark biomedical tasks. To the best of our knowledge, this is the first work that conducts an extensive evaluation and comparison of various LLMs in the biomedical domain.
Score: 2.5027382653219155
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recently, Large Language Models (LLM) have demonstrated impressive capability to solve a wide range of tasks. However, despite their success across various tasks, no prior work has investigated their capability in the biomedical domain yet. To this end, this paper aims to evaluate the performance of LLMs on benchmark biomedical tasks. For this purpose, we conduct a comprehensive evaluation of 4 popular LLMs in 6 diverse biomedical tasks across 26 datasets. To the best of our knowledge, this is the first work that conducts an extensive evaluation and comparison of various LLMs in the biomedical domain. Interestingly, we find based on our evaluation that in biomedical datasets that have smaller training sets, zero-shot LLMs even outperform the current state-of-the-art fine-tuned biomedical models. This suggests that pretraining on large text corpora makes LLMs quite specialized even in the biomedical domain. We also find that not a single LLM can outperform other LLMs in all tasks, with the performance of different LLMs may vary depending on the task. While their performance is still quite poor in comparison to the biomedical models that were fine-tuned on large training sets, our findings demonstrate that LLMs have the potential to be a valuable tool for various biomedical tasks that lack large annotated data.

Related papers

Evaluating the Effectiveness of Cost-Efficient Large Language Models in Benchmark Biomedical Tasks [2.290986589177405]
This paper presents a comprehensive evaluation of cost-efficient Large Language Models (LLMs)<n>We evaluated a range of closed-source and open-source LLMs on tasks such as biomedical text classification and generation, question answering, and multimodal image processing.<n>Our experimental results offer valuable insights for selecting models that are optimally suited for specific biomedical applications.
arXiv Detail & Related papers (2025-07-18T16:13:35Z)
Improving Automatic Evaluation of Large Language Models (LLMs) in Biomedical Relation Extraction via LLMs-as-the-Judge [7.064104563689608]
Large Language Models (LLMs) have demonstrated impressive performance in biomedical relation extraction.<n>This paper investigates the use of LLMs-as-the-Judge as an alternative evaluation method for biomedical relation extraction.
arXiv Detail & Related papers (2025-06-01T02:01:52Z)
An Empirical Study of Many-to-Many Summarization with Large Language Models [82.10000188179168]
Large language models (LLMs) have shown strong multi-lingual abilities, giving them the potential to perform Many-to-many summarization (M2MS) in real applications.<n>This work presents a systematic empirical study on LLMs' M2MS ability.
arXiv Detail & Related papers (2025-05-19T11:18:54Z)
JMedBench: A Benchmark for Evaluating Japanese Biomedical Large Language Models [29.92429306565324]
We propose a new benchmark for evaluating Japanese biomedical large language models (LLMs) Experimental results indicate that:. LLMs with a better understanding of Japanese and richer biomedical knowledge achieve better performance in Japanese biomedical tasks.
arXiv Detail & Related papers (2024-09-20T08:25:16Z)
A Survey for Large Language Models in Biomedicine [31.719451674137844]
This review is based on an analysis of 484 publications sourced from databases including PubMed, Web of Science, and arXiv. We explore the capabilities of LLMs in zero-shot learning across a broad spectrum of biomedical tasks, including diagnostic assistance, drug discovery, and personalized medicine. We discuss the challenges that LLMs face in the biomedicine domain including data privacy concerns, limited model interpretability, issues with dataset quality, and ethics.
arXiv Detail & Related papers (2024-08-29T12:39:16Z)
Biomedical Large Languages Models Seem not to be Superior to Generalist Models on Unseen Medical Data [3.469567586411153]
Large language models (LLMs) have shown potential in biomedical applications, leading to efforts to fine-tune them on domain-specific data. This study evaluates the performance of biomedically fine-tuned LLMs against their general-purpose counterparts on a variety of clinical tasks.
arXiv Detail & Related papers (2024-08-25T13:36:22Z)
An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks. These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems. Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z)
Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z)
Taiyi: A Bilingual Fine-Tuned Large Language Model for Diverse Biomedical Tasks [19.091278630792615]
Most existing biomedical large language models (LLMs) focus on enhancing performance in monolingual biomedical question answering and conversation tasks. We present Taiyi, a bilingual fine-tuned LLM for diverse biomedical tasks.
arXiv Detail & Related papers (2023-11-20T08:51:30Z)
A Survey of Large Language Models in Medicine: Progress, Application, and Challenge [85.09998659355038]
Large language models (LLMs) have received substantial attention due to their capabilities for understanding and generating human language. This review aims to provide a detailed overview of the development and deployment of LLMs in medicine.
arXiv Detail & Related papers (2023-11-09T02:55:58Z)
Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review [16.008511195589925]
Large language models (LLMs) have shown promising capabilities in mimicking human-level language comprehension and reasoning. This paper provides a comprehensive review on the applications and implications of LLMs in medicine.
arXiv Detail & Related papers (2023-11-03T13:51:36Z)
BLESS: Benchmarking Large Language Models on Sentence Simplification [55.461555829492866]
We present BLESS, a performance benchmark of the most recent state-of-the-art large language models (LLMs) on the task of text simplification (TS) We assess a total of 44 models, differing in size, architecture, pre-training methods, and accessibility, on three test sets from different domains (Wikipedia, news, and medical) under a few-shot setting. Our evaluation indicates that the best LLMs, despite not being trained on TS, perform comparably with state-of-the-art TS baselines.
arXiv Detail & Related papers (2023-10-24T12:18:17Z)
Augmenting Black-box LLMs with Medical Textbooks for Biomedical Question Answering (Published in Findings of EMNLP 2024) [48.17095875619711]
We present a system called LLMs Augmented with Medical Textbooks (LLM-AMT) LLM-AMT integrates authoritative medical textbooks into the LLMs' framework using plug-and-play modules. We found that medical textbooks as a retrieval corpus is proven to be a more effective knowledge database than Wikipedia in the medical domain.
arXiv Detail & Related papers (2023-09-05T13:39:38Z)
Sentiment Analysis in the Era of Large Language Models: A Reality Check [69.97942065617664]
This paper investigates the capabilities of large language models (LLMs) in performing various sentiment analysis tasks. We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets.
arXiv Detail & Related papers (2023-05-24T10:45:25Z)
Benchmarking large language models for biomedical natural language processing applications and recommendations [22.668383945059762]
Large Language Models (LLMs) have shown promise in general domains. We compare their zero-shot, few-shot, and fine-tuning performance with traditional fine-tuning of BERT or BART models. We find issues like missing information and hallucinations in LLM outputs.
arXiv Detail & Related papers (2023-05-10T13:40:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.