BatmanNet: Bi-branch Masked Graph Transformer Autoencoder for Molecular
Representation
- URL: http://arxiv.org/abs/2211.13979v3
- Date: Mon, 6 Nov 2023 03:01:57 GMT
- Title: BatmanNet: Bi-branch Masked Graph Transformer Autoencoder for Molecular
Representation
- Authors: Zhen Wang, Zheng Feng, Yanjun Li, Bowen Li, Yongrui Wang, Chulin Sha,
Min He, Xiaolin Li
- Abstract summary: We propose a novel bi-branch masked graph transformer autoencoder (BatmanNet) to learn molecular representations.
BatmanNet features two tailored complementary and asymmetric graph autoencoders to reconstruct the missing nodes and edges.
It achieves state-of-the-art results for multiple drug discovery tasks, including molecular properties prediction, drug-drug interaction, and drug-target interaction.
- Score: 21.03650456372902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although substantial efforts have been made using graph neural networks
(GNNs) for AI-driven drug discovery (AIDD), effective molecular representation
learning remains an open challenge, especially in the case of insufficient
labeled molecules. Recent studies suggest that big GNN models pre-trained by
self-supervised learning on unlabeled datasets enable better transfer
performance in downstream molecular property prediction tasks. However, the
approaches in these studies require multiple complex self-supervised tasks and
large-scale datasets, which are time-consuming, computationally expensive, and
difficult to pre-train end-to-end. Here, we design a simple yet effective
self-supervised strategy to simultaneously learn local and global information
about molecules, and further propose a novel bi-branch masked graph transformer
autoencoder (BatmanNet) to learn molecular representations. BatmanNet features
two tailored complementary and asymmetric graph autoencoders to reconstruct the
missing nodes and edges, respectively, from a masked molecular graph. With this
design, BatmanNet can effectively capture the underlying structure and semantic
information of molecules, thus improving the performance of molecular
representation. BatmanNet achieves state-of-the-art results for multiple drug
discovery tasks, including molecular properties prediction, drug-drug
interaction, and drug-target interaction, on 13 benchmark datasets,
demonstrating its great potential and superiority in molecular representation
learning.
Related papers
- Molecular Graph Representation Learning via Structural Similarity Information [11.38130169319915]
We introduce the textbf Structural Similarity Motif GNN (MSSM-GNN), a novel molecular graph representation learning method.
In particular, we propose a specially designed graph that leverages graph kernel algorithms to represent the similarity between molecules quantitatively.
We employ GNNs to learn feature representations from molecular graphs, aiming to enhance the accuracy of property prediction by incorporating additional molecular representation information.
arXiv Detail & Related papers (2024-09-13T06:59:10Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule
Representations [55.42602325017405]
We propose a novel method called GODE, which takes into account the two-level structure of individual molecules.
By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures.
When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - RGCVAE: Relational Graph Conditioned Variational Autoencoder for
Molecule Design [70.59828655929194]
Deep Graph Variational Autoencoders are among the most powerful machine learning tools with which it is possible to address this problem.
We propose RGCVAE, an efficient and effective Graph Variational Autoencoder based on: (i) an encoding network exploiting a new powerful Graph Isomorphism Network; (ii) a novel probabilistic decoding component.
arXiv Detail & Related papers (2023-05-19T14:23:48Z) - MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular
Representation Learning [77.31492888819935]
We propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT)
MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt.
Experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction.
arXiv Detail & Related papers (2022-12-20T19:32:30Z) - HiGNN: Hierarchical Informative Graph Neural Networks for Molecular
Property Prediction Equipped with Feature-Wise Attention [5.735627221409312]
We propose a well-designed hierarchical informative graph neural networks framework (termed HiGNN) for predicting molecular property.
Experiments demonstrate that HiGNN achieves state-of-the-art predictive performance on many challenging drug discovery-associated benchmark datasets.
arXiv Detail & Related papers (2022-08-30T05:16:15Z) - Attention-wise masked graph contrastive learning for predicting
molecular property [15.387677968070912]
We proposed a self-supervised representation learning framework for large-scale unlabeled molecules.
We developed a novel molecular graph augmentation strategy, referred to as attention-wise graph mask.
Our model can capture important molecular structure and higher-order semantic information.
arXiv Detail & Related papers (2022-05-02T00:28:02Z) - Learn molecular representations from large-scale unlabeled molecules for
drug discovery [19.222413268610808]
Molecular Pre-training Graph-based deep learning framework, named MPG, leans molecular representations from large-scale unlabeled molecules.
MolGNet can capture valuable chemistry insights to produce interpretable representation.
MPG is promising to become a novel approach in the drug discovery pipeline.
arXiv Detail & Related papers (2020-12-21T08:21:49Z) - Advanced Graph and Sequence Neural Networks for Molecular Property
Prediction and Drug Discovery [53.00288162642151]
We develop MoleculeKit, a suite of comprehensive machine learning tools spanning different computational models and molecular representations.
Built on these representations, MoleculeKit includes both deep learning and traditional machine learning methods for graph and sequence data.
Results on both online and offline antibiotics discovery and molecular property prediction tasks show that MoleculeKit achieves consistent improvements over prior methods.
arXiv Detail & Related papers (2020-12-02T02:09:31Z) - ASGN: An Active Semi-supervised Graph Neural Network for Molecular
Property Prediction [61.33144688400446]
We propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules.
In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution.
At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning.
arXiv Detail & Related papers (2020-07-07T04:22:39Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.