Related papers: Pre-training with Fractional Denoising to Enhance Molecular Property Prediction

Pre-training with Fractional Denoising to Enhance Molecular Property Prediction

URL: http://arxiv.org/abs/2407.11086v1
Date: Sun, 14 Jul 2024 11:09:42 GMT
Title: Pre-training with Fractional Denoising to Enhance Molecular Property Prediction
Authors: Yuyan Ni, Shikun Feng, Xin Hong, Yuancheng Sun, Wei-Ying Ma, Zhi-Ming Ma, Qiwei Ye, Yanyan Lan,
Abstract summary: We introduce a molecular pre-training framework called fractional denoising (Frad), which decouples noise design from the constraints imposed by force learning equivalence. Experiments demonstrate that our framework consistently outperforms existing methods, establishing state-of-the-art results across force prediction, quantum chemical properties, and binding affinity tasks.
Score: 26.93248595345132
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Deep learning methods have been considered promising for accelerating molecular screening in drug discovery and material design. Due to the limited availability of labelled data, various self-supervised molecular pre-training methods have been presented. While many existing methods utilize common pre-training tasks in computer vision (CV) and natural language processing (NLP), they often overlook the fundamental physical principles governing molecules. In contrast, applying denoising in pre-training can be interpreted as an equivalent force learning, but the limited noise distribution introduces bias into the molecular distribution. To address this issue, we introduce a molecular pre-training framework called fractional denoising (Frad), which decouples noise design from the constraints imposed by force learning equivalence. In this way, the noise becomes customizable, allowing for incorporating chemical priors to significantly improve molecular distribution modeling. Experiments demonstrate that our framework consistently outperforms existing methods, establishing state-of-the-art results across force prediction, quantum chemical properties, and binding affinity tasks. The refined noise design enhances force accuracy and sampling coverage, which contribute to the creation of physically consistent molecular representations, ultimately leading to superior predictive performance.

Related papers

Equivariant Masked Position Prediction for Efficient Molecular Representation [6.761418610103767]
Graph neural networks (GNNs) have shown considerable promise in computational chemistry. We introduce a novel self-supervised approach termed Equivariant Masked Position Prediction. EMPP formulates a nuanced position prediction task that is more well-defined and enhances the learning of quantum mechanical features.
arXiv Detail & Related papers (2025-02-12T08:39:26Z)
Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms. This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z)
Conditional Synthesis of 3D Molecules with Time Correction Sampler [58.0834973489875]
Time-Aware Conditional Synthesis (TACS) is a novel approach to conditional generation on diffusion models. It integrates adaptively controlled plug-and-play "online" guidance into a diffusion model, driving samples toward the desired properties.
arXiv Detail & Related papers (2024-11-01T12:59:25Z)
Sliced Denoising: A Physics-Informed Molecular Pre-Training Method [29.671249096191726]
This paper proposes a new method for molecular pre-training, called sliced denoising (SliDe) SliDe uses a novel noise strategy that perturbs bond lengths, angles, and torsion angles to achieve better sampling over conformations. It shows a 42% improvement in the accuracy of estimated force fields compared to current state-of-the-art denoising methods.
arXiv Detail & Related papers (2023-11-03T07:58:05Z)
Learning Invariant Molecular Representation in Latent Discrete Space [52.13724532622099]
We propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts. Our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts.
arXiv Detail & Related papers (2023-10-22T04:06:44Z)
MolCAP: Molecular Chemical reActivity pretraining and prompted-finetuning enhanced molecular representation learning [3.179128580341411]
MolCAP is a graph pretraining Transformer based on chemical reactivity (IMR) knowledge with prompted finetuning. Prompted by MolCAP, even basic graph neural networks are capable of achieving surprising performance that outperforms previous models.
arXiv Detail & Related papers (2023-06-13T13:48:06Z)
Towards Predicting Equilibrium Distributions for Molecular Systems with Deep Learning [60.02391969049972]
We introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems. DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system.
arXiv Detail & Related papers (2023-06-08T17:12:08Z)
Denoise Pretraining on Nonequilibrium Molecules for Accurate and Transferable Neural Potentials [8.048439531116367]
We propose denoise pretraining on nonequilibrium molecular conformations to achieve more accurate and transferable GNN potential predictions. Our models pretrained on small molecules demonstrate remarkable transferability, improving performance when fine-tuned on diverse molecular systems.
arXiv Detail & Related papers (2023-03-03T21:15:22Z)
MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular Representation Learning [77.31492888819935]
We propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT) MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt. Experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction.
arXiv Detail & Related papers (2022-12-20T19:32:30Z)
Pre-training via Denoising for Molecular Property Prediction [53.409242538744444]
We describe a pre-training technique that utilizes large datasets of 3D molecular structures at equilibrium. Inspired by recent advances in noise regularization, our pre-training objective is based on denoising.
arXiv Detail & Related papers (2022-05-31T22:28:34Z)
Analysis of training and seed bias in small molecules generated with a conditional graph-based variational autoencoder -- Insights for practical AI-driven molecule generation [0.0]
We analyze the impact of seed and training bias on the output of an activity-conditioned graph-based variational autoencoder (VAE) Our graph-based generative model is shown to excel in producing desired conditioned activities and favorable unconditioned physical properties in generated molecules.
arXiv Detail & Related papers (2021-07-19T16:00:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.