Protein Sequence and Structure Co-Design with Equivariant Translation
- URL: http://arxiv.org/abs/2210.08761v1
- Date: Mon, 17 Oct 2022 06:00:12 GMT
- Title: Protein Sequence and Structure Co-Design with Equivariant Translation
- Authors: Chence Shi, Chuanrui Wang, Jiarui Lu, Bozitao Zhong, Jian Tang
- Abstract summary: Existing approaches generate both protein sequence and structure using either autoregressive models or diffusion models.
We propose a new approach capable of protein sequence and structure co-design, which iteratively translates both protein sequence and structure into the desired state.
Our model consists of a trigonometry-aware encoder that reasons geometrical constraints and interactions from context features.
All protein amino acids are updated in one shot in each translation step, which significantly accelerates the inference process.
- Score: 19.816174223173494
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Proteins are macromolecules that perform essential functions in all living
organisms. Designing novel proteins with specific structures and desired
functions has been a long-standing challenge in the field of bioengineering.
Existing approaches generate both protein sequence and structure using either
autoregressive models or diffusion models, both of which suffer from high
inference costs. In this paper, we propose a new approach capable of protein
sequence and structure co-design, which iteratively translates both protein
sequence and structure into the desired state from random initialization, based
on context features given a priori. Our model consists of a trigonometry-aware
encoder that reasons geometrical constraints and interactions from context
features, and a roto-translation equivariant decoder that translates protein
sequence and structure interdependently. Notably, all protein amino acids are
updated in one shot in each translation step, which significantly accelerates
the inference process. Experimental results across multiple tasks show that our
model outperforms previous state-of-the-art baselines by a large margin, and is
able to design proteins of high fidelity as regards both sequence and
structure, with running time orders of magnitude less than sampling-based
methods.
Related papers
- Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation [55.93511121486321]
We introduce FoldFlow-2, a novel sequence-conditioned flow matching model for protein structure generation.
We train FoldFlow-2 at scale on a new dataset that is an order of magnitude larger than PDB datasets of prior works.
We empirically observe that FoldFlow-2 outperforms previous state-of-the-art protein structure-based generative models.
arXiv Detail & Related papers (2024-05-30T17:53:50Z) - Diffusion on language model embeddings for protein sequence generation [0.5442686600296733]
We introduce DiMA, a model that leverages continuous diffusion to generate amino acid sequences.
We quantitatively illustrate the impact of the design choices that lead to its superior performance.
Our approach consistently produces novel, diverse protein sequences that accurately reflect the inherent structural and functional diversity of the protein space.
arXiv Detail & Related papers (2024-03-06T14:15:20Z) - FoldToken: Learning Protein Language via Vector Quantization and Beyond [56.19308144551836]
We introduce textbfFoldTokenizer to represent protein sequence-structure as discrete symbols.
We refer to the learned symbols as textbfFoldToken, and the sequence of FoldTokens serves as a new protein language.
arXiv Detail & Related papers (2024-02-04T12:18:51Z) - Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs)
We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness.
Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z) - Generating Novel, Designable, and Diverse Protein Structures by
Equivariantly Diffusing Oriented Residue Clouds [0.0]
Structure-based protein design aims to find structures that are designable, novel, and diverse.
Generative models provide a compelling alternative, by implicitly learning the low-dimensional structure of complex data.
We develop Genie, a generative model of protein structures that performs discrete-time diffusion using a cloud of oriented reference frames in 3D space.
arXiv Detail & Related papers (2023-01-29T16:44:19Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Protein Structure and Sequence Generation with Equivariant Denoising
Diffusion Probabilistic Models [3.5450828190071646]
An important task in bioengineering is designing proteins with specific 3D structures and chemical properties which enable targeted functions.
We introduce a generative model of both protein structure and sequence that can operate at significantly larger scales than previous molecular generative modeling approaches.
arXiv Detail & Related papers (2022-05-26T16:10:09Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - BERTology Meets Biology: Interpreting Attention in Protein Language
Models [124.8966298974842]
We demonstrate methods for analyzing protein Transformer models through the lens of attention.
We show that attention captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure.
We also present a three-dimensional visualization of the interaction between attention and protein structure.
arXiv Detail & Related papers (2020-06-26T21:50:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.