An Evaluation of Large Language Models in Bioinformatics Research
- URL: http://arxiv.org/abs/2402.13714v1
- Date: Wed, 21 Feb 2024 11:27:31 GMT
- Title: An Evaluation of Large Language Models in Bioinformatics Research
- Authors: Hengchuang Yin, Zhonghui Gu, Fanhao Wang, Yiparemu Abuduhaibaier,
Yanqiao Zhu, Xinming Tu, Xian-Sheng Hua, Xiao Luo, Yizhou Sun
- Abstract summary: We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks.
These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems.
Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
- Score: 52.100233156012756
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) such as ChatGPT have gained considerable
interest across diverse research communities. Their notable ability for text
completion and generation has inaugurated a novel paradigm for
language-interfaced problem solving. However, the potential and efficacy of
these models in bioinformatics remain incompletely explored. In this work, we
study the performance LLMs on a wide spectrum of crucial bioinformatics tasks.
These tasks include the identification of potential coding regions, extraction
of named entities for genes and proteins, detection of antimicrobial and
anti-cancer peptides, molecular optimization, and resolution of educational
bioinformatics problems. Our findings indicate that, given appropriate prompts,
LLMs like GPT variants can successfully handle most of these tasks. In
addition, we provide a thorough analysis of their limitations in the context of
complicated bioinformatics tasks. In conclusion, we believe that this work can
provide new perspectives and motivate future research in the field of LLMs
applications, AI for Science and bioinformatics.
Related papers
- NeuroSym-BioCAT: Leveraging Neuro-Symbolic Methods for Biomedical Scholarly Document Categorization and Question Answering [0.14999444543328289]
We introduce a novel approach that integrates an optimized topic modelling framework, OVB-LDA, with the BI-POP CMA-ES optimization technique for enhanced scholarly document abstract categorization.
We employ the distilled MiniLM model, fine-tuned on domain-specific data, for high-precision answer extraction.
arXiv Detail & Related papers (2024-10-29T14:45:12Z) - A Survey for Large Language Models in Biomedicine [31.719451674137844]
This review is based on an analysis of 484 publications sourced from databases including PubMed, Web of Science, and arXiv.
We explore the capabilities of LLMs in zero-shot learning across a broad spectrum of biomedical tasks, including diagnostic assistance, drug discovery, and personalized medicine.
We discuss the challenges that LLMs face in the biomedicine domain including data privacy concerns, limited model interpretability, issues with dataset quality, and ethics.
arXiv Detail & Related papers (2024-08-29T12:39:16Z) - Multimodal Large Language Models for Bioimage Analysis [39.120941702559726]
Multimodal Large Language Models (MLLMs) exhibit strong emergent capacities, such as understanding, analyzing, reasoning, and generalization.
With these capabilities, MLLMs hold promise to extract intricate information from biological images and data obtained through various modalities.
Development of MLLMs shows increasing promise in serving as intelligent assistants or agents for augmenting human researchers in biology research.
arXiv Detail & Related papers (2024-07-29T08:21:25Z) - Leveraging Biomolecule and Natural Language through Multi-Modal
Learning: A Survey [75.47055414002571]
The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology.
We provide an analysis of recent advancements achieved through cross modeling of biomolecules and natural language.
arXiv Detail & Related papers (2024-03-03T14:59:47Z) - Progress and Opportunities of Foundation Models in Bioinformatics [77.74411726471439]
Foundations models (FMs) have ushered in a new era in computational biology, especially in the realm of deep learning.
Central to our focus is the application of FMs to specific biological problems, aiming to guide the research community in choosing appropriate FMs for their research needs.
Review analyses challenges and limitations faced by FMs in biology, such as data noise, model explainability, and potential biases.
arXiv Detail & Related papers (2024-02-06T02:29:17Z) - Large language models in bioinformatics: applications and perspectives [14.16418711188321]
Large language models (LLMs) are artificial intelligence models based on deep learning.
This review focuses on exploring the applications of large language models in genomics, transcriptomics, drug discovery and single cell analysis.
arXiv Detail & Related papers (2024-01-08T17:26:59Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab [67.24684071577211]
The challenge of replicating research results has posed a significant impediment to the field of molecular biology.
We first curate a comprehensive multimodal dataset, named ProBio, as an initial step towards this objective.
Next, we devise two challenging benchmarks, transparent solution tracking and multimodal action recognition, to emphasize the unique characteristics and difficulties associated with activity understanding in BioLab settings.
arXiv Detail & Related papers (2023-11-01T14:44:01Z) - Machine Learning in Nano-Scale Biomedical Engineering [77.75587007080894]
We review the existing research regarding the use of machine learning in nano-scale biomedical engineering.
The main challenges that can be formulated as ML problems are classified into the three main categories.
For each of the presented methodologies, special emphasis is given to its principles, applications, and limitations.
arXiv Detail & Related papers (2020-08-05T15:45:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.