Pre-training Transformers for Molecular Property Prediction Using
Reaction Prediction
- URL: http://arxiv.org/abs/2207.02724v1
- Date: Wed, 6 Jul 2022 14:51:38 GMT
- Title: Pre-training Transformers for Molecular Property Prediction Using
Reaction Prediction
- Authors: Johan Broberg, Maria B{\aa}nkestad, Erik Ylip\"a\"a
- Abstract summary: Transfer learning has had a tremendous impact in fields like Computer Vision and Natural Language Processing.
We present a pre-training procedure for molecular representation learning using reaction data.
We show a statistically significant positive effect on 5 of the 12 tasks compared to a non-pre-trained baseline model.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Molecular property prediction is essential in chemistry, especially for drug
discovery applications. However, available molecular property data is often
limited, encouraging the transfer of information from related data. Transfer
learning has had a tremendous impact in fields like Computer Vision and Natural
Language Processing signaling for its potential in molecular property
prediction. We present a pre-training procedure for molecular representation
learning using reaction data and use it to pre-train a SMILES Transformer. We
fine-tune and evaluate the pre-trained model on 12 molecular property
prediction tasks from MoleculeNet within physical chemistry, biophysics, and
physiology and show a statistically significant positive effect on 5 of the 12
tasks compared to a non-pre-trained baseline model.
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - MolCAP: Molecular Chemical reActivity pretraining and
prompted-finetuning enhanced molecular representation learning [3.179128580341411]
MolCAP is a graph pretraining Transformer based on chemical reactivity (IMR) knowledge with prompted finetuning.
Prompted by MolCAP, even basic graph neural networks are capable of achieving surprising performance that outperforms previous models.
arXiv Detail & Related papers (2023-06-13T13:48:06Z) - Molecule Design by Latent Space Energy-Based Modeling and Gradual
Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery.
We propose a probabilistic generative model to capture the joint distribution of molecules and their properties.
Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z) - MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular
Representation Learning [77.31492888819935]
We propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT)
MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt.
Experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction.
arXiv Detail & Related papers (2022-12-20T19:32:30Z) - Supervised Pretraining for Molecular Force Fields and Properties
Prediction [16.86839767858162]
We propose to pretrain neural networks on a dataset of 86 millions of molecules with atom charges and 3D geometries as inputs and molecular energies as labels.
Experiments show that, compared to training from scratch, fine-tuning the pretrained model can significantly improve the performance for seven molecular property prediction tasks and two force field tasks.
arXiv Detail & Related papers (2022-11-23T08:36:50Z) - Graph neural networks for the prediction of molecular structure-property
relationships [59.11160990637615]
Graph neural networks (GNNs) are a novel machine learning method that directly work on the molecular graph.
GNNs allow to learn properties in an end-to-end fashion, thereby avoiding the need for informative descriptors.
We describe the fundamentals of GNNs and demonstrate the application of GNNs via two examples for molecular property prediction.
arXiv Detail & Related papers (2022-07-25T11:30:44Z) - KPGT: Knowledge-Guided Pre-training of Graph Transformer for Molecular
Property Prediction [13.55018269009361]
We introduce Knowledge-guided Pre-training of Graph Transformer (KPGT), a novel self-supervised learning framework for molecular graph representation learning.
KPGT can offer superior performance over current state-of-the-art methods on several molecular property prediction tasks.
arXiv Detail & Related papers (2022-06-02T08:22:14Z) - Improving Molecular Representation Learning with Metric
Learning-enhanced Optimal Transport [49.237577649802034]
We develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems.
MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances.
arXiv Detail & Related papers (2022-02-13T04:56:18Z) - Analysis of training and seed bias in small molecules generated with a
conditional graph-based variational autoencoder -- Insights for practical
AI-driven molecule generation [0.0]
We analyze the impact of seed and training bias on the output of an activity-conditioned graph-based variational autoencoder (VAE)
Our graph-based generative model is shown to excel in producing desired conditioned activities and favorable unconditioned physical properties in generated molecules.
arXiv Detail & Related papers (2021-07-19T16:00:05Z) - Do Large Scale Molecular Language Representations Capture Important
Structural Information? [31.76876206167457]
We present molecular embeddings obtained by training an efficient transformer encoder model, referred to as MoLFormer.
Experiments show that the learned molecular representation performs competitively, when compared to graph-based and fingerprint-based supervised learning baselines.
arXiv Detail & Related papers (2021-06-17T14:33:55Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.