Pre-training with Fractional Denoising to Enhance Molecular Property Prediction
- URL: http://arxiv.org/abs/2407.11086v1
- Date: Sun, 14 Jul 2024 11:09:42 GMT
- Title: Pre-training with Fractional Denoising to Enhance Molecular Property Prediction
- Authors: Yuyan Ni, Shikun Feng, Xin Hong, Yuancheng Sun, Wei-Ying Ma, Zhi-Ming Ma, Qiwei Ye, Yanyan Lan,
- Abstract summary: We introduce a molecular pre-training framework called fractional denoising (Frad), which decouples noise design from the constraints imposed by force learning equivalence.
Experiments demonstrate that our framework consistently outperforms existing methods, establishing state-of-the-art results across force prediction, quantum chemical properties, and binding affinity tasks.
- Score: 26.93248595345132
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep learning methods have been considered promising for accelerating molecular screening in drug discovery and material design. Due to the limited availability of labelled data, various self-supervised molecular pre-training methods have been presented. While many existing methods utilize common pre-training tasks in computer vision (CV) and natural language processing (NLP), they often overlook the fundamental physical principles governing molecules. In contrast, applying denoising in pre-training can be interpreted as an equivalent force learning, but the limited noise distribution introduces bias into the molecular distribution. To address this issue, we introduce a molecular pre-training framework called fractional denoising (Frad), which decouples noise design from the constraints imposed by force learning equivalence. In this way, the noise becomes customizable, allowing for incorporating chemical priors to significantly improve molecular distribution modeling. Experiments demonstrate that our framework consistently outperforms existing methods, establishing state-of-the-art results across force prediction, quantum chemical properties, and binding affinity tasks. The refined noise design enhances force accuracy and sampling coverage, which contribute to the creation of physically consistent molecular representations, ultimately leading to superior predictive performance.
Related papers
- Sliced Denoising: A Physics-Informed Molecular Pre-Training Method [29.671249096191726]
This paper proposes a new method for molecular pre-training, called sliced denoising (SliDe)
SliDe uses a novel noise strategy that perturbs bond lengths, angles, and torsion angles to achieve better sampling over conformations.
It shows a 42% improvement in the accuracy of estimated force fields compared to current state-of-the-art denoising methods.
arXiv Detail & Related papers (2023-11-03T07:58:05Z) - Learning Invariant Molecular Representation in Latent Discrete Space [52.13724532622099]
We propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts.
Our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts.
arXiv Detail & Related papers (2023-10-22T04:06:44Z) - MolCAP: Molecular Chemical reActivity pretraining and
prompted-finetuning enhanced molecular representation learning [3.179128580341411]
MolCAP is a graph pretraining Transformer based on chemical reactivity (IMR) knowledge with prompted finetuning.
Prompted by MolCAP, even basic graph neural networks are capable of achieving surprising performance that outperforms previous models.
arXiv Detail & Related papers (2023-06-13T13:48:06Z) - Towards Predicting Equilibrium Distributions for Molecular Systems with
Deep Learning [60.02391969049972]
We introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.
DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system.
arXiv Detail & Related papers (2023-06-08T17:12:08Z) - Denoise Pretraining on Nonequilibrium Molecules for Accurate and
Transferable Neural Potentials [8.048439531116367]
We propose denoise pretraining on nonequilibrium molecular conformations to achieve more accurate and transferable GNN potential predictions.
Our models pretrained on small molecules demonstrate remarkable transferability, improving performance when fine-tuned on diverse molecular systems.
arXiv Detail & Related papers (2023-03-03T21:15:22Z) - MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular
Representation Learning [77.31492888819935]
We propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT)
MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt.
Experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction.
arXiv Detail & Related papers (2022-12-20T19:32:30Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Pre-training via Denoising for Molecular Property Prediction [53.409242538744444]
We describe a pre-training technique that utilizes large datasets of 3D molecular structures at equilibrium.
Inspired by recent advances in noise regularization, our pre-training objective is based on denoising.
arXiv Detail & Related papers (2022-05-31T22:28:34Z) - Accurate Machine Learned Quantum-Mechanical Force Fields for
Biomolecular Simulations [51.68332623405432]
Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes.
Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations.
This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations.
arXiv Detail & Related papers (2022-05-17T13:08:28Z) - Analysis of training and seed bias in small molecules generated with a
conditional graph-based variational autoencoder -- Insights for practical
AI-driven molecule generation [0.0]
We analyze the impact of seed and training bias on the output of an activity-conditioned graph-based variational autoencoder (VAE)
Our graph-based generative model is shown to excel in producing desired conditioned activities and favorable unconditioned physical properties in generated molecules.
arXiv Detail & Related papers (2021-07-19T16:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.