Generating Novel, Designable, and Diverse Protein Structures by
Equivariantly Diffusing Oriented Residue Clouds
- URL: http://arxiv.org/abs/2301.12485v3
- Date: Tue, 6 Jun 2023 21:54:16 GMT
- Title: Generating Novel, Designable, and Diverse Protein Structures by
Equivariantly Diffusing Oriented Residue Clouds
- Authors: Yeqing Lin, Mohammed AlQuraishi
- Abstract summary: Structure-based protein design aims to find structures that are designable, novel, and diverse.
Generative models provide a compelling alternative, by implicitly learning the low-dimensional structure of complex data.
We develop Genie, a generative model of protein structures that performs discrete-time diffusion using a cloud of oriented reference frames in 3D space.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Proteins power a vast array of functional processes in living cells. The
capability to create new proteins with designed structures and functions would
thus enable the engineering of cellular behavior and development of
protein-based therapeutics and materials. Structure-based protein design aims
to find structures that are designable (can be realized by a protein sequence),
novel (have dissimilar geometry from natural proteins), and diverse (span a
wide range of geometries). While advances in protein structure prediction have
made it possible to predict structures of novel protein sequences, the
combinatorially large space of sequences and structures limits the practicality
of search-based methods. Generative models provide a compelling alternative, by
implicitly learning the low-dimensional structure of complex data
distributions. Here, we leverage recent advances in denoising diffusion
probabilistic models and equivariant neural networks to develop Genie, a
generative model of protein structures that performs discrete-time diffusion
using a cloud of oriented reference frames in 3D space. Through in silico
evaluations, we demonstrate that Genie generates protein backbones that are
more designable, novel, and diverse than existing models. This indicates that
Genie is capturing key aspects of the distribution of protein structure space
and facilitates protein design with high success rates. Code for generating new
proteins and training new versions of Genie is available at
https://github.com/aqlaboratory/genie.
Related papers
- SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - A Latent Diffusion Model for Protein Structure Generation [50.74232632854264]
We propose a latent diffusion model that can reduce the complexity of protein modeling.
We show that our method can effectively generate novel protein backbone structures with high designability and efficiency.
arXiv Detail & Related papers (2023-05-06T19:10:19Z) - Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs)
We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness.
Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z) - Protein Sequence and Structure Co-Design with Equivariant Translation [19.816174223173494]
Existing approaches generate both protein sequence and structure using either autoregressive models or diffusion models.
We propose a new approach capable of protein sequence and structure co-design, which iteratively translates both protein sequence and structure into the desired state.
Our model consists of a trigonometry-aware encoder that reasons geometrical constraints and interactions from context features.
All protein amino acids are updated in one shot in each translation step, which significantly accelerates the inference process.
arXiv Detail & Related papers (2022-10-17T06:00:12Z) - Protein structure generation via folding diffusion [16.12124223972183]
We present a new diffusion-based generative model that designs protein backbone structures.
We generate new structures by denoising from a random, unfolded state towards a stable folded structure.
As a useful resource, we release the first open-source and trained models for protein structure diffusion.
arXiv Detail & Related papers (2022-09-30T17:35:53Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Structure-aware Protein Self-supervised Learning [50.04673179816619]
We propose a novel structure-aware protein self-supervised learning method to capture structural information of proteins.
In particular, a well-designed graph neural network (GNN) model is pretrained to preserve the protein structural information.
We identify the relation between the sequential information in the protein language model and the structural information in the specially designed GNN model via a novel pseudo bi-level optimization scheme.
arXiv Detail & Related papers (2022-04-06T02:18:41Z) - G-VAE, a Geometric Convolutional VAE for ProteinStructure Generation [41.66010308405784]
We introduce a joint geometric-neural networks approach for comparing, deforming and generating 3D protein structures.
Our method is able to generate plausible structures, different from the structures in the training data.
arXiv Detail & Related papers (2021-06-22T16:52:48Z) - Functional Protein Structure Annotation Using a Deep Convolutional
Generative Adversarial Network [4.3871352596331255]
We introduce the use of a Deep Convolutional Generative Adversarial Network (DCGAN) to classify protein structures based on their functionality.
We train DCGAN on 3-dimensional (3D) decoy and native protein structures in order to generate and discriminate 3D protein structures.
arXiv Detail & Related papers (2021-04-18T22:18:52Z) - Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution.
We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.