Fused Gromov-Wasserstein Contrastive Learning for Effective Enzyme-Reaction Screening
- URL: http://arxiv.org/abs/2512.08508v1
- Date: Tue, 09 Dec 2025 11:49:24 GMT
- Title: Fused Gromov-Wasserstein Contrastive Learning for Effective Enzyme-Reaction Screening
- Authors: Gengmo Zhou, Feng Yu, Wenda Wang, Zhifeng Gao, Guolin Ke, Zhewei Wei, Zhen Wang,
- Abstract summary: FGW-CLIP is a contrastive learning framework based on optimizing the Gromov-Wasserstein distance.<n>FGW-CLIP consistently outperforms across all three splits of ReactZyme, the largest enzyme-reaction benchmark.<n>These results position FGW-CLIP as a promising framework for enzyme discovery in complex biochemical settings.
- Score: 32.25999474073762
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Enzymes are crucial catalysts that enable a wide range of biochemical reactions. Efficiently identifying specific enzymes from vast protein libraries is essential for advancing biocatalysis. Traditional computational methods for enzyme screening and retrieval are time-consuming and resource-intensive. Recently, deep learning approaches have shown promise. However, these methods focus solely on the interaction between enzymes and reactions, overlooking the inherent hierarchical relationships within each domain. To address these limitations, we introduce FGW-CLIP, a novel contrastive learning framework based on optimizing the fused Gromov-Wasserstein distance. FGW-CLIP incorporates multiple alignments, including inter-domain alignment between reactions and enzymes and intra-domain alignment within enzymes and reactions. By introducing a tailored regularization term, our method minimizes the Gromov-Wasserstein distance between enzyme and reaction spaces, which enhances information integration across these domains. Extensive evaluations demonstrate the superiority of FGW-CLIP in challenging enzyme-reaction tasks. On the widely-used EnzymeMap benchmark, FGW-CLIP achieves state-of-the-art performance in enzyme virtual screening, as measured by BEDROC and EF metrics. Moreover, FGW-CLIP consistently outperforms across all three splits of ReactZyme, the largest enzyme-reaction benchmark, demonstrating robust generalization to novel enzymes and reactions. These results position FGW-CLIP as a promising framework for enzyme discovery in complex biochemical settings, with strong adaptability across diverse screening scenarios.
Related papers
- Multimodal Regression for Enzyme Turnover Rates Prediction [57.60697333734054]
We propose a framework for predicting the enzyme turnover rate by integrating enzyme sequences, substrate structures, and environmental factors.<n>Our model combines a pre-trained language model and a convolutional neural network to extract features from protein sequences.<n>We leverage symbolic regression via Kolmogorov-Arnold Networks to explicitly learn mathematical formulas that govern the enzyme turnover rate.
arXiv Detail & Related papers (2025-09-15T11:07:26Z) - OmniESI: A unified framework for enzyme-substrate interaction prediction with progressive conditional deep learning [46.402707495664174]
We introduce a two-stage progressive framework, OmniESI, for enzyme-substrate interaction prediction through conditional deep learning.<n>We show that OmniESI consistently delivered superior performance than state-of-the-art specialized methods.<n>Overall, OmniESI represents a unified predictive approach for enzyme-substrate interactions.
arXiv Detail & Related papers (2025-06-22T09:40:40Z) - Interpretable Enzyme Function Prediction via Residue-Level Detection [58.30647671797602]
We present an attention-based framework, namely ProtDETR, for enzyme function prediction.<n>It uses a set of learnable functional queries to adaptatively extract different local representations from the sequence of residue-level features.<n>ProtDETR significantly outperforms existing deep learning-based enzyme function prediction methods.
arXiv Detail & Related papers (2025-01-10T01:02:43Z) - Reaction-conditioned De Novo Enzyme Design with GENzyme [64.14088142258498]
textscGENzyme is a textitde novo enzyme design model that takes a catalytic reaction as input and generates the catalytic pocket, full enzyme structure, and enzyme-substrate binding complex.<n>textscGENzyme is an end-to-end, three-staged model that integrates (1) a catalytic pocket generation and sequence co-design module, (2) a pocket inpainting and enzyme inverse folding module, and (3) a binding and screening module to optimize and predict enzyme-substrate complexes.
arXiv Detail & Related papers (2024-11-10T00:37:26Z) - EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary Dynamics [51.47520281819253]
Enzyme design is a critical area in biotechnology, with applications ranging from drug development to synthetic biology.
Traditional methods for enzyme function prediction or protein binding pocket design often fall short in capturing the dynamic and complex nature of enzyme-substrate interactions.
We introduce EnzymeFlow, a generative model that employs flow matching with hierarchical pre-training and enzyme-reaction co-evolution to generate catalytic pockets.
arXiv Detail & Related papers (2024-10-01T02:04:01Z) - ReactZyme: A Benchmark for Enzyme-Reaction Prediction [41.33939896203491]
We introduce a new approach to annotating enzymes based on their catalyzed reactions.
We employ machine learning algorithms to analyze enzyme reaction datasets.
We frame the enzyme-reaction prediction as a retrieval problem, aiming to rank enzymes by their catalytic ability for specific reactions.
arXiv Detail & Related papers (2024-08-24T19:19:33Z) - Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates [16.5169461287914]
We propose EnzyGen, an approach to learn a unified model to design enzymes across all functional families.
Our key idea is to generate an enzyme's amino acid sequence and their 3D coordinates based on functionally important sites and substrates corresponding to a desired catalytic function.
arXiv Detail & Related papers (2024-05-13T21:48:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.