Related papers: EnzyCLIP: A Cross-Attention Dual Encoder Framework with Contrastive Learning for Predicting Enzyme Kinetic Constants

EnzyCLIP: A Cross-Attention Dual Encoder Framework with Contrastive Learning for Predicting Enzyme Kinetic Constants

URL: http://arxiv.org/abs/2512.00379v1
Date: Sat, 29 Nov 2025 08:13:06 GMT
Title: EnzyCLIP: A Cross-Attention Dual Encoder Framework with Contrastive Learning for Predicting Enzyme Kinetic Constants
Authors: Anas Aziz Khan, Md Shah Fahad, Priyanka, Ramesh Chandra, Guransh Singh,
Abstract summary: We present EnzyCLIP, a novel dual-encoder framework to predict enzyme kinetic parameters from protein sequences and substrate molecular structures.<n>The model is trained on the CatPred-DB database containing 23,151 Kcat and 41,174 Km experimentally validated measurements.<n>XGBoost ensemble methods applied to the learned embeddings further improved Km prediction (R2 = 0.61) while maintaining robust Kcat performance.
Score: 2.92594095183629
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate prediction of enzyme kinetic parameters is crucial for drug discovery, metabolic engineering, and synthetic biology applications. Current computational approaches face limitations in capturing complex enzyme-substrate interactions and often focus on single parameters while neglecting the joint prediction of catalytic turnover numbers (Kcat) and Michaelis-Menten constants (Km). We present EnzyCLIP, a novel dual-encoder framework that leverages contrastive learning and cross-attention mechanisms to predict enzyme kinetic parameters from protein sequences and substrate molecular structures. Our approach integrates ESM-2 protein language model embeddings with ChemBERTa chemical representations through a CLIP-inspired architecture enhanced with bidirectional cross-attention for dynamic enzyme-substrate interaction modeling. EnzyCLIP combines InfoNCE contrastive loss with Huber regression loss to learn aligned multimodal representations while predicting log10-transformed kinetic parameters. The model is trained on the CatPred-DB database containing 23,151 Kcat and 41,174 Km experimentally validated measurements, and achieved competitive performance with R2 scores of 0.593 for Kcat and 0.607 for Km prediction. XGBoost ensemble methods applied to the learned embeddings further improved Km prediction (R2 = 0.61) while maintaining robust Kcat performance.

Related papers

From Static Spectra to Operando Infrared Dynamics: Physics Informed Flow Modeling and a Benchmark [67.29937933325849]
Operando IR Prediction aims to forecast the time-resolved evolution of spectral fingerprints'' from a single static spectrum.<n>OpIRSpec-7K comprises 7,118 high-quality samples across 10 distinct battery systems.<n>ABCC significantly outperforms state-of-the-art static, sequential, and generative baselines.
arXiv Detail & Related papers (2026-02-20T18:58:43Z)
Tensor-DTI: Enhancing Biomolecular Interaction Prediction with Contrastive Embedding Learning [0.015229507502478598]
We propose a contrastive learning framework that integrates multimodal embeddings from molecular graphs, protein language models, and binding-site predictions to improve interaction modeling.<n>Our findings highlight the benefits of integrating multimodal information with contrastive objectives to enhance interaction-prediction accuracy and to provide more interpretable and reliability-aware models for virtual screening splits.
arXiv Detail & Related papers (2026-01-09T13:39:49Z)
Investigating Knowledge Distillation Through Neural Networks for Protein Binding Affinity Prediction [0.22369578015657954]
Trade-off between predictive accuracy and data availability makes it difficult to predict protein--protein binding affinity accurately.<n>We suggest a regression framework based on knowledge distillation that uses protein structural data during training and only needs sequence data during inference.
arXiv Detail & Related papers (2026-01-07T08:43:08Z)
Synergistic Computational Approaches for Accelerated Drug Discovery: Integrating Quantum Mechanics, Statistical Thermodynamics, and Quantum Computing [0.0]
Accurately predicting protein-ligand binding free energies (Bs) remains a central challenge in drug discovery.<n>Here, we introduce a hybrid quantum-classical framework that combines Mining Minima sampling with quantum mechanically refined ligand partial charges.<n>Across 23 protein targets and 543, the method achieves a mean absolute error of about 1.10 kcal/mol with strong rank-order fidelity.
arXiv Detail & Related papers (2025-12-05T20:47:34Z)
Fine-Tuning ChemBERTa for Predicting Inhibitory Activity Against TDP1 Using Deep Learning [0.0]
Predicting the potency of small molecules against Tyrosyl-DNA Phosphodiesterase 1 (TDP1) is a critical challenge in early drug discovery.<n>We present a deep learning framework for the quantitative regression of pIC50 values using fine-tuned variants of ChemBERTa.<n>Our approach outperforms classical baselines Random Predictor in both regression accuracy and virtual screening utility.
arXiv Detail & Related papers (2025-12-03T20:42:22Z)
Multimodal Regression for Enzyme Turnover Rates Prediction [57.60697333734054]
We propose a framework for predicting the enzyme turnover rate by integrating enzyme sequences, substrate structures, and environmental factors.<n>Our model combines a pre-trained language model and a convolutional neural network to extract features from protein sequences.<n>We leverage symbolic regression via Kolmogorov-Arnold Networks to explicitly learn mathematical formulas that govern the enzyme turnover rate.
arXiv Detail & Related papers (2025-09-15T11:07:26Z)
KinForm: Kinetics Informed Feature Optimised Representation Models for Enzyme $k_{cat}$ and $K_{M}$ Prediction [0.0]
KinForm is a machine learning framework designed to improve predictive accuracy and generalisation for kinetic parameters.<n>We observe improvements from binding-site probability pooling, intermediate-layer selection, PCA, and oversampling of low-identity proteins.
arXiv Detail & Related papers (2025-07-19T14:34:57Z)
OmniESI: A unified framework for enzyme-substrate interaction prediction with progressive conditional deep learning [46.402707495664174]
We introduce a two-stage progressive framework, OmniESI, for enzyme-substrate interaction prediction through conditional deep learning.<n>We show that OmniESI consistently delivered superior performance than state-of-the-art specialized methods.<n>Overall, OmniESI represents a unified predictive approach for enzyme-substrate interactions.
arXiv Detail & Related papers (2025-06-22T09:40:40Z)
Binding Affinity Prediction: From Conventional to Machine Learning-Based Approaches [48.66541987908136]
Much work has been devoted to predicting binding affinity over the past decades.<n>We note growing use of both traditional machine learning and deep learning models for predicting binding affinity.<n>With improved predictive performance and the FDA's phasing out of animal testing, AI-driven in silico models, such as AI virtual cells (AIVCs), are poised to advance binding affinity prediction.
arXiv Detail & Related papers (2024-09-30T03:40:49Z)
YZS-model: A Predictive Model for Organic Drug Solubility Based on Graph Convolutional Networks and Transformer-Attention [9.018408514318631]
Traditional methods often miss complex molecular structures, leading to inaccuracies. We introduce the YZS-Model, a deep learning framework integrating Graph Convolutional Networks (GCN), Transformer architectures, and Long Short-Term Memory (LSTM) networks. YZS-Model achieved an $R2$ of 0.59 and an RMSE of 0.57, outperforming benchmark models.
arXiv Detail & Related papers (2024-06-27T12:40:29Z)
State-specific protein-ligand complex structure prediction with a multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures. Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z)
From Static to Dynamic Structures: Improving Binding Affinity Prediction with Graph-Based Deep Learning [40.83037811977803]
Dynaformer is a graph-based deep learning model developed to predict protein-ligand binding affinities. It exhibits state-of-the-art scoring and ranking power on the CASF-2016 benchmark dataset. In a virtual screening on heat shock protein 90 (HSP90), 20 candidates are identified and their binding affinities are experimentally validated.
arXiv Detail & Related papers (2022-08-19T14:55:12Z)
Improved Drug-target Interaction Prediction with Intermolecular Graph Transformer [98.8319016075089]
We propose a novel approach to model intermolecular information with a three-way Transformer-based architecture. Intermolecular Graph Transformer (IGT) outperforms state-of-the-art approaches by 9.1% and 20.5% over the second best for binding activity and binding pose prediction respectively. IGT exhibits promising drug screening ability against SARS-CoV-2 by identifying 83.1% active drugs that have been validated by wet-lab experiments with near-native predicted binding poses.
arXiv Detail & Related papers (2021-10-14T13:28:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.