Deep Generative Modeling for Protein Design
- URL: http://arxiv.org/abs/2109.13754v1
- Date: Tue, 31 Aug 2021 14:38:26 GMT
- Title: Deep Generative Modeling for Protein Design
- Authors: Alexey Strokach, Philip M. Kim
- Abstract summary: Deep learning approaches have produced breakthroughs in fields such as image classification and natural language processing.
generative models of proteins have been developed that encompass all known protein sequences, model specific protein families, or extrapolate the dynamics of individual proteins.
We discuss five classes of generative models that have been most successful at modeling proteins and provide a framework for model guided protein design.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning approaches have produced substantial breakthroughs in fields
such as image classification and natural language processing and are making
rapid inroads in the area of protein design. Many generative models of proteins
have been developed that encompass all known protein sequences, model specific
protein families, or extrapolate the dynamics of individual proteins. Those
generative models can learn protein representations that are often more
informative of protein structure and function than hand-engineered features.
Furthermore, they can be used to quickly propose millions of novel proteins
that resemble the native counterparts in terms of expression level, stability,
or other attributes. The protein design process can further be guided by
discriminative oracles to select candidates with the highest probability of
having the desired properties. In this review, we discuss five classes of
generative models that have been most successful at modeling proteins and
provide a framework for model guided protein design.
Related papers
- Unifying Sequences, Structures, and Descriptions for Any-to-Any Protein Generation with the Large Multimodal Model HelixProtX [14.927425008686692]
We introduce HelixProtX, a system built upon the large multimodal model, to support any-to-any protein modality generation.
HelixProtX consistently achieves superior accuracy across a range of protein-related tasks, outperforming existing state-of-the-art models.
arXiv Detail & Related papers (2024-07-12T14:03:02Z) - ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training [82.37346937497136]
We propose a versatile cross-modal large language model (LLM) for both protein-centric and protein-language tasks.
ProtLLM features a unique dynamic protein mounting mechanism, enabling it to handle complex inputs.
By developing a specialized protein vocabulary, we equip the model with the capability to predict not just natural language but also proteins from a vast pool of candidates.
arXiv Detail & Related papers (2024-02-28T01:29:55Z) - Generative artificial intelligence for de novo protein design [1.2021565114959365]
Generative architectures seem adept at generating novel, yet realistic proteins.
Design protocols now achieve experimental success rates nearing 20%.
Despite extensive progress, there are clear field-wide challenges.
arXiv Detail & Related papers (2023-10-15T00:02:22Z) - A Latent Diffusion Model for Protein Structure Generation [50.74232632854264]
We propose a latent diffusion model that can reduce the complexity of protein modeling.
We show that our method can effectively generate novel protein backbone structures with high designability and efficiency.
arXiv Detail & Related papers (2023-05-06T19:10:19Z) - A Text-guided Protein Design Framework [106.79061950107922]
We propose ProteinDT, a multi-modal framework that leverages textual descriptions for protein design.
ProteinDT consists of three subsequent steps: ProteinCLAP which aligns the representation of two modalities, a facilitator that generates the protein representation from the text modality, and a decoder that creates the protein sequences from the representation.
We quantitatively verify the effectiveness of ProteinDT on three challenging tasks: (1) over 90% accuracy for text-guided protein generation; (2) best hit ratio on 12 zero-shot text-guided protein editing tasks; (3) superior performance on four out of six protein property prediction benchmarks.
arXiv Detail & Related papers (2023-02-09T12:59:16Z) - Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs)
We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness.
Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z) - Plug & Play Directed Evolution of Proteins with Gradient-based Discrete
MCMC [1.0499611180329804]
A long-standing goal of machine-learning-based protein engineering is to accelerate the discovery of novel mutations.
We introduce a sampling framework for evolving proteins in silico that supports mixing and matching a variety of unsupervised models.
By composing these models, we aim to improve our ability to evaluate unseen mutations and constrain search to regions of sequence space likely to contain functional proteins.
arXiv Detail & Related papers (2022-12-20T00:26:23Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Structure-aware Protein Self-supervised Learning [50.04673179816619]
We propose a novel structure-aware protein self-supervised learning method to capture structural information of proteins.
In particular, a well-designed graph neural network (GNN) model is pretrained to preserve the protein structural information.
We identify the relation between the sequential information in the protein language model and the structural information in the specially designed GNN model via a novel pseudo bi-level optimization scheme.
arXiv Detail & Related papers (2022-04-06T02:18:41Z) - OntoProtein: Protein Pretraining With Gene Ontology Embedding [36.92674447484136]
We propose OntoProtein, the first general framework that makes use of structure in GO (Gene Ontology) into protein pre-training models.
We construct a novel large-scale knowledge graph that consists of GO and its related proteins, and gene annotation texts or protein sequences describe all nodes in the graph.
arXiv Detail & Related papers (2022-01-23T14:49:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.