Equivariant Pretrained Transformer for Unified Geometric Learning on
Multi-Domain 3D Molecules
- URL: http://arxiv.org/abs/2402.12714v1
- Date: Tue, 20 Feb 2024 04:40:00 GMT
- Title: Equivariant Pretrained Transformer for Unified Geometric Learning on
Multi-Domain 3D Molecules
- Authors: Rui Jiao, Xiangzhe Kong, Ziyang Yu, Wenbing Huang and Yang Liu
- Abstract summary: Equivariant Pretrained Transformer (EPT) is a novel pretraining framework designed to harmonize the geometric learning of small molecules and proteins.
EPT unifies the geometric modeling of multi-domain molecules via the block-enhanced representation that can attend a broader context of each atom.
Another key innovation of EPT is its block-level pretraining task, which allows for joint pretraining on datasets comprising both small molecules and proteins.
- Score: 23.189608074493997
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Pretraining on a large number of unlabeled 3D molecules has showcased
superiority in various scientific applications. However, prior efforts
typically focus on pretraining models on a specific domain, either proteins or
small molecules, missing the opportunity to leverage the cross-domain
knowledge. To mitigate this gap, we introduce Equivariant Pretrained
Transformer (EPT), a novel pretraining framework designed to harmonize the
geometric learning of small molecules and proteins. To be specific, EPT unifies
the geometric modeling of multi-domain molecules via the block-enhanced
representation that can attend a broader context of each atom. Upon transformer
framework, EPT is further enhanced with E(3) equivariance to facilitate the
accurate representation of 3D structures. Another key innovation of EPT is its
block-level pretraining task, which allows for joint pretraining on datasets
comprising both small molecules and proteins. Experimental evaluations on a
diverse group of benchmarks, including ligand binding affinity prediction,
molecular property prediction, and protein property prediction, show that EPT
significantly outperforms previous SOTA methods for affinity prediction, and
achieves the best or comparable performance with existing domain-specific
pretraining models for other tasks.
Related papers
- Zatom-1: A Multimodal Flow Foundation Model for 3D Molecules and Materials [51.342983349686556]
General-purpose 3D chemical modeling encompasses molecules and materials, requiring both generative and predictive capabilities.<n>We introduce Zatom-1, the first end-to-end, fully open-source foundation model that unifies generative and predictive learning of 3D molecules and materials.
arXiv Detail & Related papers (2026-02-24T20:52:39Z) - Dynamics-inspired Structure Hallucination for Protein-protein Interaction Modeling [60.57197355431804]
Protein-protein interaction (PPI) represents a central challenge within the biology field.<n>Deep learning has shown promise in forecasting the effects of such mutations, but is hindered by two primary constraints.<n>We present a novel framework named Refine-PPI with two key enhancements.
arXiv Detail & Related papers (2026-01-08T19:29:04Z) - SE(3)-Equivariant Ternary Complex Prediction Towards Target Protein Degradation [28.648225112411637]
Targeted protein degradation (TPD) induced by small molecules has emerged as a rapidly evolving modality in drug discovery.
DeepTernary is a novel deep learning-based approach that directly predicts ternary structures in an end-to-end manner.
arXiv Detail & Related papers (2025-02-26T06:33:24Z) - Structure Language Models for Protein Conformation Generation [66.42864253026053]
Traditional physics-based simulation methods often struggle with sampling equilibrium conformations.
Deep generative models have shown promise in generating protein conformations as a more efficient alternative.
We introduce Structure Language Modeling as a novel framework for efficient protein conformation generation.
arXiv Detail & Related papers (2024-10-24T03:38:51Z) - Binding Affinity Prediction: From Conventional to Machine Learning-Based Approaches [48.66541987908136]
Much work has been devoted to predicting binding affinity over the past decades.<n>We note growing use of both traditional machine learning and deep learning models for predicting binding affinity.<n>With improved predictive performance and the FDA's phasing out of animal testing, AI-driven in silico models, such as AI virtual cells (AIVCs), are poised to advance binding affinity prediction.
arXiv Detail & Related papers (2024-09-30T03:40:49Z) - Autoregressive Enzyme Function Prediction with Multi-scale Multi-modality Fusion [11.278610817877578]
We introduce MAPred, a novel multi-modality and multi-scale model designed to autoregressively predict the EC number of proteins.
MAPred integrates both the primary amino acid sequence and the 3D tokens of proteins, employing a dual-pathway approach to capture comprehensive protein characteristics.
Evaluations on benchmark datasets, including New-392, Price, and New-815, demonstrate that our method outperforms existing models.
arXiv Detail & Related papers (2024-08-11T08:28:43Z) - Protein binding affinity prediction under multiple substitutions applying eGNNs on Residue and Atomic graphs combined with Language model information: eGRAL [1.840390797252648]
Deep learning is increasingly recognized as a powerful tool capable of bridging the gap between in-silico predictions and in-vitro observations.
We propose eGRAL, a novel graph neural network architecture designed for predicting binding affinity changes from amino acid substitutions in protein complexes.
eGRAL leverages residue, atomic and evolutionary scales, thanks to features extracted from protein large language models.
arXiv Detail & Related papers (2024-05-03T10:33:19Z) - Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task.
We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z) - Pre-Training Protein Bi-level Representation Through Span Mask Strategy On 3D Protein Chains [1.2893576217358405]
We introduce a span mask pre-training strategy on 3D protein chains to learn meaningful representations of both residues and atoms.
This leads to a simple yet effective approach to learning protein representation suitable for diverse downstream tasks.
arXiv Detail & Related papers (2024-02-02T15:07:09Z) - Learning to design protein-protein interactions with enhanced generalization [14.983309106361899]
We construct PPIRef, the largest and non-redundant dataset of 3D protein-protein interactions.
We leverage the PPIRef dataset to pre-train PPIformer, a new SE(3)-equivariant model generalizing across diverse protein-binder variants.
We fine-tune PPIformer to predict effects of mutations on protein-protein interactions via a thermodynamically motivated adjustment of the pre-training loss function.
arXiv Detail & Related papers (2023-10-27T22:22:44Z) - Automated 3D Pre-Training for Molecular Property Prediction [54.15788181794094]
We propose a novel 3D pre-training framework (dubbed 3D PGT)
It pre-trains a model on 3D molecular graphs, and then fine-tunes it on molecular graphs without 3D structures.
Extensive experiments on 2D molecular graphs are conducted to demonstrate the accuracy, efficiency and generalization ability of the proposed 3D PGT.
arXiv Detail & Related papers (2023-06-13T14:43:13Z) - Multi-task Bioassay Pre-training for Protein-ligand Binding Affinity
Prediction [26.530876904939163]
We propose Multi-task Bioassay Pre-training (MBP), a pre-training framework for structure-based PLBA prediction.
MBP learns robust and transferrable structural knowledge from our new ChEMBL-Dock dataset with varied and noisy labels.
arXiv Detail & Related papers (2023-06-08T02:29:49Z) - Integration of Pre-trained Protein Language Models into Geometric Deep
Learning Networks [68.90692290665648]
We integrate knowledge learned by protein language models into several state-of-the-art geometric networks.
Our findings show an overall improvement of 20% over baselines.
Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin.
arXiv Detail & Related papers (2022-12-07T04:04:04Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - From Static to Dynamic Structures: Improving Binding Affinity Prediction with Graph-Based Deep Learning [40.83037811977803]
Dynaformer is a graph-based deep learning model developed to predict protein-ligand binding affinities.
It exhibits state-of-the-art scoring and ranking power on the CASF-2016 benchmark dataset.
In a virtual screening on heat shock protein 90 (HSP90), 20 candidates are identified and their binding affinities are experimentally validated.
arXiv Detail & Related papers (2022-08-19T14:55:12Z) - Pre-training via Denoising for Molecular Property Prediction [53.409242538744444]
We describe a pre-training technique that utilizes large datasets of 3D molecular structures at equilibrium.
Inspired by recent advances in noise regularization, our pre-training objective is based on denoising.
arXiv Detail & Related papers (2022-05-31T22:28:34Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Accurate Machine Learned Quantum-Mechanical Force Fields for
Biomolecular Simulations [51.68332623405432]
Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes.
Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations.
This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations.
arXiv Detail & Related papers (2022-05-17T13:08:28Z) - Improved Drug-target Interaction Prediction with Intermolecular Graph
Transformer [98.8319016075089]
We propose a novel approach to model intermolecular information with a three-way Transformer-based architecture.
Intermolecular Graph Transformer (IGT) outperforms state-of-the-art approaches by 9.1% and 20.5% over the second best for binding activity and binding pose prediction respectively.
IGT exhibits promising drug screening ability against SARS-CoV-2 by identifying 83.1% active drugs that have been validated by wet-lab experiments with near-native predicted binding poses.
arXiv Detail & Related papers (2021-10-14T13:28:02Z) - G-VAE, a Geometric Convolutional VAE for ProteinStructure Generation [41.66010308405784]
We introduce a joint geometric-neural networks approach for comparing, deforming and generating 3D protein structures.
Our method is able to generate plausible structures, different from the structures in the training data.
arXiv Detail & Related papers (2021-06-22T16:52:48Z) - BIGDML: Towards Exact Machine Learning Force Fields for Materials [55.944221055171276]
Machine-learning force fields (MLFF) should be accurate, computationally and data efficient, and applicable to molecules, materials, and interfaces thereof.
Here, we introduce the Bravais-Inspired Gradient-Domain Machine Learning approach and demonstrate its ability to construct reliable force fields using a training set with just 10-200 atoms.
arXiv Detail & Related papers (2021-06-08T10:14:57Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z) - CogMol: Target-Specific and Selective Drug Design for COVID-19 Using
Deep Generative Models [74.58583689523999]
We propose an end-to-end framework, named CogMol, for designing new drug-like small molecules targeting novel viral proteins.
CogMol combines adaptive pre-training of a molecular SMILES Variational Autoencoder (VAE) and an efficient multi-attribute controlled sampling scheme.
CogMol handles multi-constraint design of synthesizable, low-toxic, drug-like molecules with high target specificity and selectivity.
arXiv Detail & Related papers (2020-04-02T18:17:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.