Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates
- URL: http://arxiv.org/abs/2405.08205v3
- Date: Wed, 17 Jul 2024 15:14:32 GMT
- Title: Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates
- Authors: Zhenqiao Song, Yunlong Zhao, Wenxian Shi, Wengong Jin, Yang Yang, Lei Li,
- Abstract summary: We propose EnzyGen, an approach to learn a unified model to design enzymes across all functional families.
Our key idea is to generate an enzyme's amino acid sequence and their 3D coordinates based on functionally important sites and substrates corresponding to a desired catalytic function.
- Score: 16.5169461287914
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Enzymes are genetically encoded biocatalysts capable of accelerating chemical reactions. How can we automatically design functional enzymes? In this paper, we propose EnzyGen, an approach to learn a unified model to design enzymes across all functional families. Our key idea is to generate an enzyme's amino acid sequence and their three-dimensional (3D) coordinates based on functionally important sites and substrates corresponding to a desired catalytic function. These sites are automatically mined from enzyme databases. EnzyGen consists of a novel interleaving network of attention and neighborhood equivariant layers, which captures both long-range correlation in an entire protein sequence and local influence from nearest amino acids in 3D space. To learn the generative model, we devise a joint training objective, including a sequence generation loss, a position prediction loss and an enzyme-substrate interaction loss. We further construct EnzyBench, a dataset with 3157 enzyme families, covering all available enzymes within the protein data bank (PDB). Experimental results show that our EnzyGen consistently achieves the best performance across all 323 testing families, surpassing the best baseline by 10.79% in terms of substrate binding affinity. These findings demonstrate EnzyGen's superior capability in designing well-folded and effective enzymes binding to specific substrates with high affinities.
Related papers
- EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary Dynamics [51.47520281819253]
Enzyme design is a critical area in biotechnology, with applications ranging from drug development to synthetic biology.
Traditional methods for enzyme function prediction or protein binding pocket design often fall short in capturing the dynamic and complex nature of enzyme-substrate interactions.
We introduce EnzymeFlow, a generative model that employs flow matching with hierarchical pre-training and enzyme-reaction co-evolution to generate catalytic pockets.
arXiv Detail & Related papers (2024-10-01T02:04:01Z) - ReactZyme: A Benchmark for Enzyme-Reaction Prediction [41.33939896203491]
We introduce a new approach to annotating enzymes based on their catalyzed reactions.
We employ machine learning algorithms to analyze enzyme reaction datasets.
We frame the enzyme-reaction prediction as a retrieval problem, aiming to rank enzymes by their catalytic ability for specific reactions.
arXiv Detail & Related papers (2024-08-24T19:19:33Z) - Clustering for Protein Representation Learning [72.72957540484664]
We propose a neural clustering framework that can automatically discover the critical components of a protein.
Our framework treats a protein as a graph, where each node represents an amino acid and each edge represents a spatial or sequential connection between amino acids.
We evaluate on four protein-related tasks: protein fold classification, enzyme reaction classification, gene term prediction, and enzyme commission number prediction.
arXiv Detail & Related papers (2024-03-30T05:51:09Z) - Functional Geometry Guided Protein Sequence and Backbone Structure
Co-Design [12.585697288315846]
We propose a model to jointly design Protein sequence and structure based on automatically detected functional sites.
NAEPro is powered by an interleaving network of attention and equivariant layers, which can capture global correlation in a whole sequence.
Experimental results show that our model consistently achieves the highest amino acid recovery rate, TM-score, and the lowest RMSD among all competitors.
arXiv Detail & Related papers (2023-10-06T16:08:41Z) - A Latent Diffusion Model for Protein Structure Generation [50.74232632854264]
We propose a latent diffusion model that can reduce the complexity of protein modeling.
We show that our method can effectively generate novel protein backbone structures with high designability and efficiency.
arXiv Detail & Related papers (2023-05-06T19:10:19Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Machine learning modeling of family wide enzyme-substrate specificity
screens [2.276367922551686]
Biocatalysis is a promising approach to synthesize pharmaceuticals, complex natural products, and commodity chemicals at scale.
The adoption of biocatalysis is limited by our ability to select enzymes that will catalyze their natural chemical transformation on non-natural substrates.
arXiv Detail & Related papers (2021-09-08T19:44:42Z) - BERTology Meets Biology: Interpreting Attention in Protein Language
Models [124.8966298974842]
We demonstrate methods for analyzing protein Transformer models through the lens of attention.
We show that attention captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure.
We also present a three-dimensional visualization of the interaction between attention and protein structure.
arXiv Detail & Related papers (2020-06-26T21:50:17Z) - Enzyme promiscuity prediction using hierarchy-informed multi-label
classification [6.6828647808002595]
We present and evaluate machine-learning models to predict which of 983 distinct enzymes are likely to interact with a query molecule.
Some interactions are attributed to natural selection and involve the enzyme's natural substrates.
The majority of the interactions however involve non-natural substrates, thus reflecting promiscuous enzymatic activities.
arXiv Detail & Related papers (2020-02-18T01:39:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.