3D Graph Contrastive Learning for Molecular Property Prediction
- URL: http://arxiv.org/abs/2208.06360v2
- Date: Thu, 18 Aug 2022 13:10:50 GMT
- Title: 3D Graph Contrastive Learning for Molecular Property Prediction
- Authors: Kisung Moon, Sunyoung Kwon
- Abstract summary: Self-supervised learning (SSL) is a method that learns the data representation by utilizing supervision inherent in the data.
We propose a novel contrastive learning framework, small-scale 3D Graph Contrastive Learning (3DGCL) for molecular property prediction.
- Score: 1.0152838128195467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning (SSL) is a method that learns the data
representation by utilizing supervision inherent in the data. This learning
method is in the spotlight in the drug field, lacking annotated data due to
time-consuming and expensive experiments. SSL using enormous unlabeled data has
shown excellent performance for molecular property prediction, but a few issues
exist. (1) Existing SSL models are large-scale; there is a limitation to
implementing SSL where the computing resource is insufficient. (2) In most
cases, they do not utilize 3D structural information for molecular
representation learning. The activity of a drug is closely related to the
structure of the drug molecule. Nevertheless, most current models do not use 3D
information or use it partially. (3) Previous models that apply contrastive
learning to molecules use the augmentation of permuting atoms and bonds.
Therefore, molecules having different characteristics can be in the same
positive samples. We propose a novel contrastive learning framework,
small-scale 3D Graph Contrastive Learning (3DGCL) for molecular property
prediction, to solve the above problems. 3DGCL learns the molecular
representation by reflecting the molecule's structure through the pre-training
process that does not change the semantics of the drug. Using only 1,128
samples for pre-train data and 1 million model parameters, we achieved the
state-of-the-art or comparable performance in four regression benchmark
datasets. Extensive experiments demonstrate that 3D structural information
based on chemical knowledge is essential to molecular representation learning
for property prediction.
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - 3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information [1.1777304970289215]
3D-Mol is a novel approach designed for more accurate spatial structure representation.
It deconstructs molecules into three hierarchical graphs to better extract geometric information.
We compare 3D-Mol with various state-of-the-art baselines on 7 benchmarks and demonstrate our outstanding performance.
arXiv Detail & Related papers (2023-09-28T10:05:37Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule
Representations [55.42602325017405]
We propose a novel method called GODE, which takes into account the two-level structure of individual molecules.
By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures.
When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular
Representation Learning [77.31492888819935]
We propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT)
MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt.
Experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction.
arXiv Detail & Related papers (2022-12-20T19:32:30Z) - Improving Molecular Contrastive Learning via Faulty Negative Mitigation
and Decomposed Fragment Contrast [17.142976840521264]
We propose iMolCLR: improvement of Molecular Contrastive Learning of Representations with graph neural networks (GNNs)
Experiments have shown that the proposed strategies significantly improve the performance of GNN models.
iMolCLR intrinsically embeds scaffolds and functional groups that can reason molecule similarities.
arXiv Detail & Related papers (2022-02-18T18:33:27Z) - 3D Infomax improves GNNs for Molecular Property Prediction [1.9703625025720701]
We propose pre-training a model to reason about the geometry of molecules given only their 2D molecular graphs.
We show that 3D pre-training provides significant improvements for a wide range of properties.
arXiv Detail & Related papers (2021-10-08T13:30:49Z) - Molecular machine learning with conformer ensembles [0.0]
We introduce multiple deep learning models that expand upon key architectures such as ChemProp and Schnet.
We then benchmark the performance trade-offs of these models on 2D, 3D and 4D representations in the prediction of drug activity.
The new architectures perform significantly better than 2D models, but their performance is often just as strong with a single conformer as with many.
arXiv Detail & Related papers (2020-12-15T17:44:48Z) - ASGN: An Active Semi-supervised Graph Neural Network for Molecular
Property Prediction [61.33144688400446]
We propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules.
In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution.
At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning.
arXiv Detail & Related papers (2020-07-07T04:22:39Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z) - Targeting SARS-CoV-2 with AI- and HPC-enabled Lead Generation: A First
Data Release [8.090016327163564]
This data release encompasses structural information on the 4.2 B molecules and 60 TB of pre-computed data.
One promising approach is to train machine learning (ML) and artificial intelligence (AI) tools to screen large numbers of small molecules.
Future releases will expand the data to include more detailed molecular simulations, computed models, and other products.
arXiv Detail & Related papers (2020-05-28T01:33:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.