Generative De Novo Protein Design with Global Context
- URL: http://arxiv.org/abs/2204.10673v1
- Date: Thu, 21 Apr 2022 02:55:01 GMT
- Title: Generative De Novo Protein Design with Global Context
- Authors: Cheng Tan, Zhangyang Gao, Jun Xia, Stan Z. Li
- Abstract summary: The inverse of protein structure prediction aims to obtain a novel protein sequence that will fold into the defined structure.
Recent works on computational protein design have studied designing sequences for the desired backbone structure with local positional information.
We propose the Global-Context Aware generative de novo protein design method (GCA), consisting of local and global modules.
- Score: 36.21545615114117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The linear sequence of amino acids determines protein structure and function.
Protein design, known as the inverse of protein structure prediction, aims to
obtain a novel protein sequence that will fold into the defined structure.
Recent works on computational protein design have studied designing sequences
for the desired backbone structure with local positional information and
achieved competitive performance. However, similar local environments in
different backbone structures may result in different amino acids, indicating
that protein structure's global context matters. Thus, we propose the
Global-Context Aware generative de novo protein design method (GCA), consisting
of local and global modules. While local modules focus on relationships between
neighbor amino acids, global modules explicitly capture non-local contexts.
Experimental results demonstrate that the proposed GCA method outperforms
state-of-the-arts on de novo protein design. Our code and pretrained model will
be released.
Related papers
- Protein Representation Learning with Sequence Information Embedding: Does it Always Lead to a Better Performance? [4.7077642423577775]
We propose ProtLOCA, a local geometry alignment method based solely on amino acid structure representation.
Our method outperforms existing sequence- and structure-based representation learning methods by more quickly and accurately matching structurally consistent protein domains.
arXiv Detail & Related papers (2024-06-28T08:54:37Z) - Functional Protein Design with Local Domain Alignment [39.79713846491306]
We propose Protein- Alignment Generation (PAAG), a multi-modality protein design framework that integrates the textual annotations extracted from protein database for controllable generation in sequence space.
Specifically, within a multi-level alignment module, PAAG can explicitly generate proteins containing specific domains conditioned on the corresponding domain annotations.
Our experimental results underscore the superiority of the aligned protein representations from PAAG over 7 prediction tasks.
arXiv Detail & Related papers (2024-04-18T09:37:54Z) - ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction [54.132290875513405]
The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases.
Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions.
We propose a novel framework ProLLM that employs an LLM tailored for PPI for the first time.
arXiv Detail & Related papers (2024-03-30T05:32:42Z) - Functional Geometry Guided Protein Sequence and Backbone Structure
Co-Design [12.585697288315846]
We propose a model to jointly design Protein sequence and structure based on automatically detected functional sites.
NAEPro is powered by an interleaving network of attention and equivariant layers, which can capture global correlation in a whole sequence.
Experimental results show that our model consistently achieves the highest amino acid recovery rate, TM-score, and the lowest RMSD among all competitors.
arXiv Detail & Related papers (2023-10-06T16:08:41Z) - Joint Design of Protein Sequence and Structure based on Motifs [11.731131799546489]
We propose GeoPro, a method to design protein backbone structure and sequence jointly.
GeoPro is powered by an equivariant encoder for three-dimensional (3D) backbone structure and a protein sequence decoder guided by 3D geometry.
Our method discovers novel $beta$-lactamases and myoglobins which are not present in protein data bank (PDB) and UniProt.
arXiv Detail & Related papers (2023-10-04T03:07:03Z) - Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs)
We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness.
Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z) - Protein Sequence and Structure Co-Design with Equivariant Translation [19.816174223173494]
Existing approaches generate both protein sequence and structure using either autoregressive models or diffusion models.
We propose a new approach capable of protein sequence and structure co-design, which iteratively translates both protein sequence and structure into the desired state.
Our model consists of a trigonometry-aware encoder that reasons geometrical constraints and interactions from context features.
All protein amino acids are updated in one shot in each translation step, which significantly accelerates the inference process.
arXiv Detail & Related papers (2022-10-17T06:00:12Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Structure-aware Protein Self-supervised Learning [50.04673179816619]
We propose a novel structure-aware protein self-supervised learning method to capture structural information of proteins.
In particular, a well-designed graph neural network (GNN) model is pretrained to preserve the protein structural information.
We identify the relation between the sequential information in the protein language model and the structural information in the specially designed GNN model via a novel pseudo bi-level optimization scheme.
arXiv Detail & Related papers (2022-04-06T02:18:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.