EnzyPGM: Pocket-conditioned Generative Model for Substrate-specific Enzyme Design
- URL: http://arxiv.org/abs/2601.19205v1
- Date: Tue, 27 Jan 2026 05:07:55 GMT
- Title: EnzyPGM: Pocket-conditioned Generative Model for Substrate-specific Enzyme Design
- Authors: Zefeng Lin, Zhihang Zhang, Weirong Zhu, Tongchang Han, Xianyong Fang, Tianfan Fu, Xiaohua Xu,
- Abstract summary: EnzyPGM is a unified framework that jointly generates enzymes and substrate-binding pockets conditioned on functional priors and substrates.<n>At its core, EnzyPGM includes two main modules: a Residue-atom Bi-scale Attention (RBA) that jointly models intra-residue dependencies and fine-grained interactions between pocket residues and substrate atoms, and a Residue Fusion (RFF) that incorporates enzyme function priors into residue representations.
- Score: 11.03225817843529
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Designing enzymes with substrate-binding pockets is a critical challenge in protein engineering, as catalytic activity depends on the precise interaction between pockets and substrates. Currently, generative models dominate functional protein design but cannot model pocket-substrate interactions, which limits the generation of enzymes with precise catalytic environments. To address this issue, we propose EnzyPGM, a unified framework that jointly generates enzymes and substrate-binding pockets conditioned on functional priors and substrates, with a particular focus on learning accurate pocket-substrate interactions. At its core, EnzyPGM includes two main modules: a Residue-atom Bi-scale Attention (RBA) that jointly models intra-residue dependencies and fine-grained interactions between pocket residues and substrate atoms, and a Residue Function Fusion (RFF) that incorporates enzyme function priors into residue representations. Also, we curate EnzyPock, an enzyme-pocket dataset comprising 83,062 enzyme-substrate pairs across 1,036 four-level enzyme families. Extensive experiments demonstrate that EnzyPGM achieves state-of-the-art performance on EnzyPock. Notably, EnzyPGM reduces the average binding energy of 0.47 kcal/mol over EnzyGen, showing its superior performance on substrate-specific enzyme design. The code and dataset will be released later.
Related papers
- EnzyControl: Adding Functional and Substrate-Specific Control for Enzyme Backbone Generation [27.86567331220053]
Current generative models excel in protein design but face limitations in binding data, substrate-specific control, and flexibility for de novo enzyme backbone generation.<n>We propose EnzyControl, a method that enables functional and substrate-specific control in enzyme backbone generation.<n>Our approach generates enzyme backbones conditioned on MSA-annotated catalytic sites and their corresponding substrates, which are automatically extracted from curated enzyme-substrate data.
arXiv Detail & Related papers (2025-10-29T03:22:32Z) - Multimodal Regression for Enzyme Turnover Rates Prediction [57.60697333734054]
We propose a framework for predicting the enzyme turnover rate by integrating enzyme sequences, substrate structures, and environmental factors.<n>Our model combines a pre-trained language model and a convolutional neural network to extract features from protein sequences.<n>We leverage symbolic regression via Kolmogorov-Arnold Networks to explicitly learn mathematical formulas that govern the enzyme turnover rate.
arXiv Detail & Related papers (2025-09-15T11:07:26Z) - PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs [88.98041407783502]
PRING is the first benchmark that evaluates protein-protein interaction prediction from a graph-level perspective.<n> PRING curates a high-quality, multi-species PPI network dataset comprising 21,484 proteins and 186,818 interactions.
arXiv Detail & Related papers (2025-07-07T15:21:05Z) - OmniESI: A unified framework for enzyme-substrate interaction prediction with progressive conditional deep learning [46.402707495664174]
We introduce a two-stage progressive framework, OmniESI, for enzyme-substrate interaction prediction through conditional deep learning.<n>We show that OmniESI consistently delivered superior performance than state-of-the-art specialized methods.<n>Overall, OmniESI represents a unified predictive approach for enzyme-substrate interactions.
arXiv Detail & Related papers (2025-06-22T09:40:40Z) - Reaction-conditioned De Novo Enzyme Design with GENzyme [64.14088142258498]
textscGENzyme is a textitde novo enzyme design model that takes a catalytic reaction as input and generates the catalytic pocket, full enzyme structure, and enzyme-substrate binding complex.<n>textscGENzyme is an end-to-end, three-staged model that integrates (1) a catalytic pocket generation and sequence co-design module, (2) a pocket inpainting and enzyme inverse folding module, and (3) a binding and screening module to optimize and predict enzyme-substrate complexes.
arXiv Detail & Related papers (2024-11-10T00:37:26Z) - EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary Dynamics [51.47520281819253]
Enzyme design is a critical area in biotechnology, with applications ranging from drug development to synthetic biology.
Traditional methods for enzyme function prediction or protein binding pocket design often fall short in capturing the dynamic and complex nature of enzyme-substrate interactions.
We introduce EnzymeFlow, a generative model that employs flow matching with hierarchical pre-training and enzyme-reaction co-evolution to generate catalytic pockets.
arXiv Detail & Related papers (2024-10-01T02:04:01Z) - Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates [16.5169461287914]
We propose EnzyGen, an approach to learn a unified model to design enzymes across all functional families.
Our key idea is to generate an enzyme's amino acid sequence and their 3D coordinates based on functionally important sites and substrates corresponding to a desired catalytic function.
arXiv Detail & Related papers (2024-05-13T21:48:48Z) - Machine learning modeling of family wide enzyme-substrate specificity
screens [2.276367922551686]
Biocatalysis is a promising approach to synthesize pharmaceuticals, complex natural products, and commodity chemicals at scale.
The adoption of biocatalysis is limited by our ability to select enzymes that will catalyze their natural chemical transformation on non-natural substrates.
arXiv Detail & Related papers (2021-09-08T19:44:42Z) - CogMol: Target-Specific and Selective Drug Design for COVID-19 Using
Deep Generative Models [74.58583689523999]
We propose an end-to-end framework, named CogMol, for designing new drug-like small molecules targeting novel viral proteins.
CogMol combines adaptive pre-training of a molecular SMILES Variational Autoencoder (VAE) and an efficient multi-attribute controlled sampling scheme.
CogMol handles multi-constraint design of synthesizable, low-toxic, drug-like molecules with high target specificity and selectivity.
arXiv Detail & Related papers (2020-04-02T18:17:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.