Learning Multi-view Molecular Representations with Structured and Unstructured Knowledge
- URL: http://arxiv.org/abs/2406.09841v1
- Date: Fri, 14 Jun 2024 08:48:10 GMT
- Title: Learning Multi-view Molecular Representations with Structured and Unstructured Knowledge
- Authors: Yizhen Luo, Kai Yang, Massimo Hong, Xing Yi Liu, Zikun Nie, Hao Zhou, Zaiqing Nie,
- Abstract summary: We present MV-Mol, a representation learning model that harvests multi-view molecular expertise from chemical structures, unstructured knowledge from biomedical texts, and structured knowledge from knowledge graphs.
We show that MV-Mol provides improved representations that substantially benefit molecular property prediction.
- Score: 14.08112359246334
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Capturing molecular knowledge with representation learning approaches holds significant potential in vast scientific fields such as chemistry and life science. An effective and generalizable molecular representation is expected to capture the consensus and complementary molecular expertise from diverse views and perspectives. However, existing works fall short in learning multi-view molecular representations, due to challenges in explicitly incorporating view information and handling molecular knowledge from heterogeneous sources. To address these issues, we present MV-Mol, a molecular representation learning model that harvests multi-view molecular expertise from chemical structures, unstructured knowledge from biomedical texts, and structured knowledge from knowledge graphs. We utilize text prompts to model view information and design a fusion architecture to extract view-based molecular representations. We develop a two-stage pre-training procedure, exploiting heterogeneous data of varying quality and quantity. Through extensive experiments, we show that MV-Mol provides improved representations that substantially benefit molecular property prediction. Additionally, MV-Mol exhibits state-of-the-art performance in multi-modal comprehension of molecular structures and texts. Code and data are available at https://github.com/PharMolix/OpenBioMed.
Related papers
- FARM: Functional Group-Aware Representations for Small Molecules [55.281754551202326]
We introduce Functional Group-Aware Representations for Small Molecules (FARM)
FARM is a foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs.
We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks.
arXiv Detail & Related papers (2024-10-02T23:04:58Z) - Advancements in Molecular Property Prediction: A Survey of Single and Multimodal Approaches [1.0446041735532203]
Molecular Property Prediction (MPP) plays a pivotal role across diverse domains, spanning drug discovery, material science, and environmental chemistry.
Recent years have witnessed remarkable strides in MPP, fueled by the exponential growth of chemical data and the evolution of artificial intelligence.
This article explores recent AI/based approaches in MPP, focusing on both single and multiple modality representation techniques.
arXiv Detail & Related papers (2024-08-18T12:49:52Z) - Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model [49.64512917330373]
We introduce a multi-constraint molecular generation large language model, TSMMG, akin to a student.
To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these 'teachers'
We experimentally show that TSMMG remarkably performs in generating molecules meeting complex, natural language-described property requirements.
arXiv Detail & Related papers (2024-03-20T02:15:55Z) - Leveraging Biomolecule and Natural Language through Multi-Modal
Learning: A Survey [75.47055414002571]
The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology.
We provide an analysis of recent advancements achieved through cross modeling of biomolecules and natural language.
arXiv Detail & Related papers (2024-03-03T14:59:47Z) - From molecules to scaffolds to functional groups: building context-dependent molecular representation via multi-channel learning [10.025809630976065]
This paper introduces a novel pre-training framework that learns robust and generalizable chemical knowledge.
Our approach demonstrates competitive performance across various molecular property benchmarks.
arXiv Detail & Related papers (2023-11-05T23:47:52Z) - Interactive Molecular Discovery with Natural Language [69.89287960545903]
We propose the conversational molecular design, a novel task adopting natural language for describing and editing target molecules.
To better accomplish this task, we design ChatMol, a knowledgeable and versatile generative pre-trained model, enhanced by injecting experimental property information.
arXiv Detail & Related papers (2023-06-21T02:05:48Z) - MolFM: A Multimodal Molecular Foundation Model [9.934141536012596]
MolFM is a multimodal molecular foundation model designed to facilitate joint representation learning from molecular structures, biomedical texts, and knowledge graphs.
We provide theoretical analysis that our cross-modal pre-training captures local and global molecular knowledge by minimizing the distance in the feature space between different modalities of the same molecule.
On cross-modal retrieval, MolFM outperforms existing models with 12.13% and 5.04% absolute gains under the zero-shot and fine-tuning settings, respectively.
arXiv Detail & Related papers (2023-06-06T12:45:15Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Knowledge-aware Contrastive Molecular Graph Learning [5.08771973600915]
We propose Contrastive Knowledge-aware GNN (CKGNN) for self-supervised molecular representation learning.
We explicitly encode domain knowledge via knowledge-aware molecular encoder under the contrastive learning framework.
Experiments on 8 public datasets demonstrate the effectiveness of our model with a 6% absolute improvement on average.
arXiv Detail & Related papers (2021-03-24T08:55:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.