Leveraging Biomolecule and Natural Language through Multi-Modal
Learning: A Survey
- URL: http://arxiv.org/abs/2403.01528v2
- Date: Tue, 5 Mar 2024 11:12:47 GMT
- Title: Leveraging Biomolecule and Natural Language through Multi-Modal
Learning: A Survey
- Authors: Qizhi Pei, Lijun Wu, Kaiyuan Gao, Jinhua Zhu, Yue Wang, Zun Wang, Tao
Qin, and Rui Yan
- Abstract summary: The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology.
We provide an analysis of recent advancements achieved through cross modeling of biomolecules and natural language.
- Score: 75.47055414002571
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The integration of biomolecular modeling with natural language (BL) has
emerged as a promising interdisciplinary area at the intersection of artificial
intelligence, chemistry and biology. This approach leverages the rich,
multifaceted descriptions of biomolecules contained within textual data sources
to enhance our fundamental understanding and enable downstream computational
tasks such as biomolecule property prediction. The fusion of the nuanced
narratives expressed through natural language with the structural and
functional specifics of biomolecules described via various molecular modeling
techniques opens new avenues for comprehensively representing and analyzing
biomolecules. By incorporating the contextual language data that surrounds
biomolecules into their modeling, BL aims to capture a holistic view
encompassing both the symbolic qualities conveyed through language as well as
quantitative structural characteristics. In this review, we provide an
extensive analysis of recent advancements achieved through cross modeling of
biomolecules and natural language. (1) We begin by outlining the technical
representations of biomolecules employed, including sequences, 2D graphs, and
3D structures. (2) We then examine in depth the rationale and key objectives
underlying effective multi-modal integration of language and molecular data
sources. (3) We subsequently survey the practical applications enabled to date
in this developing research area. (4) We also compile and summarize the
available resources and datasets to facilitate future work. (5) Looking ahead,
we identify several promising research directions worthy of further exploration
and investment to continue advancing the field. The related resources and
contents are updating in
\url{https://github.com/QizhiPei/Awesome-Biomolecule-Language-Cross-Modeling}.
Related papers
- MolCap-Arena: A Comprehensive Captioning Benchmark on Language-Enhanced Molecular Property Prediction [44.27112553103388]
We present Molecule Caption Arena: the first comprehensive benchmark of large language models (LLMs)augmented molecular property prediction.
We evaluate over twenty LLMs, including both general-purpose and domain-specific molecule captioners, across diverse prediction tasks.
Our findings confirm the ability of LLM-extracted knowledge to enhance state-of-the-art molecular representations.
arXiv Detail & Related papers (2024-11-01T17:03:16Z) - Bridging Text and Molecule: A Survey on Multimodal Frameworks for Molecule [16.641797535842752]
In this paper, we present the first systematic survey on multimodal frameworks for molecules research.
We begin with the development of molecular deep learning and point out the necessity to involve textual modality.
Furthermore, we delves into the utilization of large language models and prompting techniques for molecular tasks and present significant applications in drug discovery.
arXiv Detail & Related papers (2024-03-07T03:03:13Z) - An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks.
These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems.
Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - BioT5: Enriching Cross-modal Integration in Biology with Chemical
Knowledge and Natural Language Associations [54.97423244799579]
$mathbfBioT5$ is a pre-training framework that enriches cross-modal integration in biology with chemical knowledge and natural language associations.
$mathbfBioT5$ distinguishes between structured and unstructured knowledge, leading to more effective utilization of information.
arXiv Detail & Related papers (2023-10-11T07:57:08Z) - Interactive Molecular Discovery with Natural Language [69.89287960545903]
We propose the conversational molecular design, a novel task adopting natural language for describing and editing target molecules.
To better accomplish this task, we design ChatMol, a knowledgeable and versatile generative pre-trained model, enhanced by injecting experimental property information.
arXiv Detail & Related papers (2023-06-21T02:05:48Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.