MolCAP: Molecular Chemical reActivity pretraining and
prompted-finetuning enhanced molecular representation learning
- URL: http://arxiv.org/abs/2306.09187v1
- Date: Tue, 13 Jun 2023 13:48:06 GMT
- Title: MolCAP: Molecular Chemical reActivity pretraining and
prompted-finetuning enhanced molecular representation learning
- Authors: Yu Wang, JingJie Zhang, Junru Jin, and Leyi Wei
- Abstract summary: MolCAP is a graph pretraining Transformer based on chemical reactivity (IMR) knowledge with prompted finetuning.
Prompted by MolCAP, even basic graph neural networks are capable of achieving surprising performance that outperforms previous models.
- Score: 3.179128580341411
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Molecular representation learning (MRL) is a fundamental task for drug
discovery. However, previous deep-learning (DL) methods focus excessively on
learning robust inner-molecular representations by mask-dominated pretraining
framework, neglecting abundant chemical reactivity molecular relationships that
have been demonstrated as the determining factor for various molecular property
prediction tasks. Here, we present MolCAP to promote MRL, a graph pretraining
Transformer based on chemical reactivity (IMR) knowledge with prompted
finetuning. Results show that MolCAP outperforms comparative methods based on
traditional molecular pretraining framework, in 13 publicly available molecular
datasets across a diversity of biomedical tasks. Prompted by MolCAP, even basic
graph neural networks are capable of achieving surprising performance that
outperforms previous models, indicating the promising prospect of applying
reactivity information for MRL. In addition, manual designed molecular templets
are potential to uncover the dataset bias. All in all, we expect our MolCAP to
gain more chemical meaningful insights for the entire process of drug
discovery.
Related papers
- Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures [2.5563339057415218]
MolIG is a novel MultiModaL molecular pre-training framework for predicting molecular properties based on Image and Graph structures.
It amalgamates the strengths of both molecular representation forms.
It exhibits enhanced performance in downstream tasks pertaining to molecular property prediction within benchmark groups.
arXiv Detail & Related papers (2023-11-28T10:28:35Z) - Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective [53.300288393173204]
Large Language Models (LLMs) have shown remarkable performance in various cross-modal tasks.
In this work, we propose an In-context Few-Shot Molecule Learning paradigm for molecule-caption translation.
We evaluate the effectiveness of MolReGPT on molecule-caption translation, including molecule understanding and text-based molecule generation.
arXiv Detail & Related papers (2023-06-11T08:16:25Z) - MolKD: Distilling Cross-Modal Knowledge in Chemical Reactions for
Molecular Property Prediction [9.100067773907403]
How to effectively represent molecules is a long-standing challenge for molecular property prediction and drug discovery.
This paper proposes to incorporate chemical domain knowledge, specifically related to chemical reactions, for learning effective molecular representations.
We introduce a novel method, namely MolKD, which Distills cross-modal Knowledge in chemical reactions to assist Molecular property prediction.
arXiv Detail & Related papers (2023-05-03T06:01:03Z) - MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular
Representation Learning [77.31492888819935]
We propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT)
MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt.
Experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction.
arXiv Detail & Related papers (2022-12-20T19:32:30Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Graph-based Molecular Representation Learning [59.06193431883431]
Molecular representation learning (MRL) is a key step to build the connection between machine learning and chemical science.
Recently, MRL has achieved considerable progress, especially in methods based on deep molecular graph learning.
arXiv Detail & Related papers (2022-07-08T17:43:20Z) - KPGT: Knowledge-Guided Pre-training of Graph Transformer for Molecular
Property Prediction [13.55018269009361]
We introduce Knowledge-guided Pre-training of Graph Transformer (KPGT), a novel self-supervised learning framework for molecular graph representation learning.
KPGT can offer superior performance over current state-of-the-art methods on several molecular property prediction tasks.
arXiv Detail & Related papers (2022-06-02T08:22:14Z) - Do Large Scale Molecular Language Representations Capture Important
Structural Information? [31.76876206167457]
We present molecular embeddings obtained by training an efficient transformer encoder model, referred to as MoLFormer.
Experiments show that the learned molecular representation performs competitively, when compared to graph-based and fingerprint-based supervised learning baselines.
arXiv Detail & Related papers (2021-06-17T14:33:55Z) - Few-Shot Graph Learning for Molecular Property Prediction [46.60746023179724]
We propose Meta-MGNN, a novel model for few-shot molecular property prediction.
To exploit unlabeled molecular information, Meta-MGNN further incorporates molecular structure, attribute based self-supervised modules and self-attentive task weights.
Extensive experiments on two public multi-property datasets demonstrate that Meta-MGNN outperforms a variety of state-of-the-art methods.
arXiv Detail & Related papers (2021-02-16T01:55:34Z) - Learn molecular representations from large-scale unlabeled molecules for
drug discovery [19.222413268610808]
Molecular Pre-training Graph-based deep learning framework, named MPG, leans molecular representations from large-scale unlabeled molecules.
MolGNet can capture valuable chemistry insights to produce interpretable representation.
MPG is promising to become a novel approach in the drug discovery pipeline.
arXiv Detail & Related papers (2020-12-21T08:21:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.