A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language
- URL: http://arxiv.org/abs/2209.05481v1
- Date: Mon, 12 Sep 2022 00:56:57 GMT
- Title: A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language
- Authors: Bing Su, Dazhao Du, Zhao Yang, Yujie Zhou, Jiangmeng Li, Anyi Rao, Hao
Sun, Zhiwu Lu, Ji-Rong Wen
- Abstract summary: We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
- Score: 63.60376252491507
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although artificial intelligence (AI) has made significant progress in
understanding molecules in a wide range of fields, existing models generally
acquire the single cognitive ability from the single molecular modality. Since
the hierarchy of molecular knowledge is profound, even humans learn from
different modalities including both intuitive diagrams and professional texts
to assist their understanding. Inspired by this, we propose a molecular
multimodal foundation model which is pretrained from molecular graphs and their
semantically related textual data (crawled from published Scientific Citation
Index papers) via contrastive learning. This AI model represents a critical
attempt that directly bridges molecular graphs and natural language.
Importantly, through capturing the specific and complementary information of
the two modalities, our proposed model can better grasp molecular expertise.
Experimental results show that our model not only exhibits promising
performance in cross-modal tasks such as cross-modal retrieval and molecule
caption, but also enhances molecular property prediction and possesses
capability to generate meaningful molecular graphs from natural language
descriptions. We believe that our model would have a broad impact on
AI-empowered fields across disciplines such as biology, chemistry, materials,
environment, and medicine, among others.
Related papers
- Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model [55.87790704067848]
Mol-LLaMA is a large molecular language model that grasps the general knowledge centered on molecules via multi-modal instruction tuning.
To improve understanding of molecular features, we introduce a module that integrates complementary information from different molecular encoders.
arXiv Detail & Related papers (2025-02-19T05:49:10Z) - Knowledge-aware contrastive heterogeneous molecular graph learning [77.94721384862699]
We propose a paradigm shift by encoding molecular graphs into Heterogeneous Molecular Graph Learning (KCHML)
KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism.
This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction.
arXiv Detail & Related papers (2025-02-17T11:53:58Z) - Learning Multi-view Molecular Representations with Structured and Unstructured Knowledge [14.08112359246334]
We present MV-Mol, a representation learning model that harvests multi-view molecular expertise from chemical structures, unstructured knowledge from biomedical texts, and structured knowledge from knowledge graphs.
We show that MV-Mol provides improved representations that substantially benefit molecular property prediction.
arXiv Detail & Related papers (2024-06-14T08:48:10Z) - LDMol: Text-to-Molecule Diffusion Model with Structurally Informative Latent Space [55.5427001668863]
We present a novel latent diffusion model dubbed LDMol for text-conditioned molecule generation.
LDMol comprises a molecule autoencoder that produces a learnable and structurally informative feature space.
We show that LDMol can be applied to downstream tasks such as molecule-to-text retrieval and text-guided molecule editing.
arXiv Detail & Related papers (2024-05-28T04:59:13Z) - Leveraging Biomolecule and Natural Language through Multi-Modal
Learning: A Survey [75.47055414002571]
The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology.
We provide an analysis of recent advancements achieved through cross modeling of biomolecules and natural language.
arXiv Detail & Related papers (2024-03-03T14:59:47Z) - GIT-Mol: A Multi-modal Large Language Model for Molecular Science with
Graph, Image, and Text [25.979382232281786]
We introduce GIT-Mol, a multi-modal large language model that integrates the Graph, Image, and Text information.
We achieve a 5%-10% accuracy increase in properties prediction and a 20.2% boost in molecule generation validity.
arXiv Detail & Related papers (2023-08-14T03:12:29Z) - Interactive Molecular Discovery with Natural Language [69.89287960545903]
We propose the conversational molecular design, a novel task adopting natural language for describing and editing target molecules.
To better accomplish this task, we design ChatMol, a knowledgeable and versatile generative pre-trained model, enhanced by injecting experimental property information.
arXiv Detail & Related papers (2023-06-21T02:05:48Z) - MolFM: A Multimodal Molecular Foundation Model [9.934141536012596]
MolFM is a multimodal molecular foundation model designed to facilitate joint representation learning from molecular structures, biomedical texts, and knowledge graphs.
We provide theoretical analysis that our cross-modal pre-training captures local and global molecular knowledge by minimizing the distance in the feature space between different modalities of the same molecule.
On cross-modal retrieval, MolFM outperforms existing models with 12.13% and 5.04% absolute gains under the zero-shot and fine-tuning settings, respectively.
arXiv Detail & Related papers (2023-06-06T12:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.