Learning Geometrically Disentangled Representations of Protein Folding
Simulations
- URL: http://arxiv.org/abs/2205.10423v1
- Date: Fri, 20 May 2022 19:38:00 GMT
- Title: Learning Geometrically Disentangled Representations of Protein Folding
Simulations
- Authors: N. Joseph Tatro, Payel Das, Pin-Yu Chen, Vijil Chenthamarakshan,
Rongjie Lai
- Abstract summary: This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
- Score: 72.03095377508856
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Massive molecular simulations of drug-target proteins have been used as a
tool to understand disease mechanism and develop therapeutics. This work
focuses on learning a generative neural network on a structural ensemble of a
drug-target protein, e.g. SARS-CoV-2 Spike protein, obtained from
computationally expensive molecular simulations. Model tasks involve
characterizing the distinct structural fluctuations of the protein bound to
various drug molecules, as well as efficient generation of protein
conformations that can serve as an complement of a molecular simulation engine.
Specifically, we present a geometric autoencoder framework to learn separate
latent space encodings of the intrinsic and extrinsic geometries of the protein
structure. For this purpose, the proposed Protein Geometric AutoEncoder
(ProGAE) model is trained on the protein contact map and the orientation of the
backbone bonds of the protein. Using ProGAE latent embeddings, we reconstruct
and generate the conformational ensemble of a protein at or near the
experimental resolution, while gaining better interpretability and
controllability in term of protein structure generation from the learned latent
space. Additionally, ProGAE models are transferable to a different state of the
same protein or to a new protein of different size, where only the dense layer
decoding from the latent representation needs to be retrained. Results show
that our geometric learning-based method enjoys both accuracy and efficiency
for generating complex structural variations, charting the path toward scalable
and improved approaches for analyzing and enhancing high-cost simulations of
drug-target proteins.
Related papers
- Long-context Protein Language Model [76.95505296417866]
Self-supervised training of language models (LMs) has seen great success for protein sequences in learning meaningful representations and for generative drug design.
Most protein LMs are based on the Transformer architecture trained on individual proteins with short context lengths.
We propose LC-PLM based on an alternative protein LM architecture, BiMamba-S, built off selective structured state-space models.
We also introduce its graph-contextual variant, LC-PLM-G, which contextualizes protein-protein interaction graphs for a second stage of training.
arXiv Detail & Related papers (2024-10-29T16:43:28Z) - Top-down machine learning of coarse-grained protein force-fields [2.1485350418225244]
Our methodology involves simulating proteins with molecular dynamics and utilizing the resulting trajectories to train a neural network potential.
Remarkably, this method requires only the native conformation of proteins, eliminating the need for labeled data.
By applying Markov State Models, native-like conformations of the simulated proteins can be predicted from the coarse-grained simulations.
arXiv Detail & Related papers (2023-06-20T08:31:24Z) - A Latent Diffusion Model for Protein Structure Generation [50.74232632854264]
We propose a latent diffusion model that can reduce the complexity of protein modeling.
We show that our method can effectively generate novel protein backbone structures with high designability and efficiency.
arXiv Detail & Related papers (2023-05-06T19:10:19Z) - Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs)
We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness.
Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z) - Integration of Pre-trained Protein Language Models into Geometric Deep
Learning Networks [68.90692290665648]
We integrate knowledge learned by protein language models into several state-of-the-art geometric networks.
Our findings show an overall improvement of 20% over baselines.
Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin.
arXiv Detail & Related papers (2022-12-07T04:04:04Z) - Protein Sequence and Structure Co-Design with Equivariant Translation [19.816174223173494]
Existing approaches generate both protein sequence and structure using either autoregressive models or diffusion models.
We propose a new approach capable of protein sequence and structure co-design, which iteratively translates both protein sequence and structure into the desired state.
Our model consists of a trigonometry-aware encoder that reasons geometrical constraints and interactions from context features.
All protein amino acids are updated in one shot in each translation step, which significantly accelerates the inference process.
arXiv Detail & Related papers (2022-10-17T06:00:12Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Protein Structure and Sequence Generation with Equivariant Denoising
Diffusion Probabilistic Models [3.5450828190071646]
An important task in bioengineering is designing proteins with specific 3D structures and chemical properties which enable targeted functions.
We introduce a generative model of both protein structure and sequence that can operate at significantly larger scales than previous molecular generative modeling approaches.
arXiv Detail & Related papers (2022-05-26T16:10:09Z) - G-VAE, a Geometric Convolutional VAE for ProteinStructure Generation [41.66010308405784]
We introduce a joint geometric-neural networks approach for comparing, deforming and generating 3D protein structures.
Our method is able to generate plausible structures, different from the structures in the training data.
arXiv Detail & Related papers (2021-06-22T16:52:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.