Related papers: Machine learning modeling of family wide enzyme-substrate specificity screens

Machine learning modeling of family wide enzyme-substrate specificity screens

URL: http://arxiv.org/abs/2109.03900v1
Date: Wed, 8 Sep 2021 19:44:42 GMT
Title: Machine learning modeling of family wide enzyme-substrate specificity screens
Authors: Samuel Goldman, Ria Das, Kevin K. Yang, Connor W. Coley
Abstract summary: Biocatalysis is a promising approach to synthesize pharmaceuticals, complex natural products, and commodity chemicals at scale. The adoption of biocatalysis is limited by our ability to select enzymes that will catalyze their natural chemical transformation on non-natural substrates.
Score: 2.276367922551686
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Biocatalysis is a promising approach to sustainably synthesize pharmaceuticals, complex natural products, and commodity chemicals at scale. However, the adoption of biocatalysis is limited by our ability to select enzymes that will catalyze their natural chemical transformation on non-natural substrates. While machine learning and in silico directed evolution are well-posed for this predictive modeling challenge, efforts to date have primarily aimed to increase activity against a single known substrate, rather than to identify enzymes capable of acting on new substrates of interest. To address this need, we curate 6 different high-quality enzyme family screens from the literature that each measure multiple enzymes against multiple substrates. We compare machine learning-based compound-protein interaction (CPI) modeling approaches from the literature used for predicting drug-target interactions. Surprisingly, comparing these interaction-based models against collections of independent (single task) enzyme-only or substrate-only models reveals that current CPI approaches are incapable of learning interactions between compounds and proteins in the current family level data regime. We further validate this observation by demonstrating that our no-interaction baseline can outperform CPI-based models from the literature used to guide the discovery of kinase inhibitors. Given the high performance of non-interaction based models, we introduce a new structure-based strategy for pooling residue representations across a protein sequence. Altogether, this work motivates a principled path forward in order to build and evaluate meaningful predictive models for biocatalysis and other drug discovery applications.

Related papers

Phi-Former: A Pairwise Hierarchical Approach for Compound-Protein Interactions Prediction [12.813544613908588]
Drug discovery remains time-consuming, labor-intensive, and expensive.<n>Predicting compound-protein interactions (CPIs) is a critical component in this process.<n>Recent deep learning methods have successfully modeled CPIs at the atomic level.<n>We propose Phi-former, a pairwise hierarchical interaction representation learning method.
arXiv Detail & Related papers (2026-02-05T09:39:22Z)
Rep3Net: An Approach Exploiting Multimodal Representation for Molecular Bioactivity Prediction [0.8049701904919515]
In early stage drug discovery, bioactivity prediction of molecules against target proteins plays a crucial role.<n>We propose Rep3Net, a unified deep learning architecture that not only incorporates descriptor data but also includes spatial and relational information.<n>Our model employing multimodald features produce reliable bioactivity prediction on Poly [ADP-ribose] polymerase 1 dataset.
arXiv Detail & Related papers (2025-11-29T15:39:48Z)
Learning Cell-Aware Hierarchical Multi-Modal Representations for Robust Molecular Modeling [74.25438319700929]
We propose CHMR (Cell-aware Hierarchical Multi-modal Representations), a robust framework that models local-global dependencies between molecules and cellular responses.<n> evaluated on nine public benchmarks spanning 728 tasks, CHMR outperforms state-of-the-art baselines.<n>Results demonstrate the advantage of hierarchy-aware, multimodal learning for reliable and biologically grounded molecular representations.
arXiv Detail & Related papers (2025-11-26T07:15:00Z)
Multimodal Regression for Enzyme Turnover Rates Prediction [57.60697333734054]
We propose a framework for predicting the enzyme turnover rate by integrating enzyme sequences, substrate structures, and environmental factors.<n>Our model combines a pre-trained language model and a convolutional neural network to extract features from protein sequences.<n>We leverage symbolic regression via Kolmogorov-Arnold Networks to explicitly learn mathematical formulas that govern the enzyme turnover rate.
arXiv Detail & Related papers (2025-09-15T11:07:26Z)
Conditional Chemical Language Models are Versatile Tools in Drug Discovery [0.0]
We present SAFE-T, a chemical modeling framework that conditions on biological context to prioritize molecules.<n>It supports principled scoring of molecules across tasks such as virtual screening, drug-target interaction prediction, and activity cliff detection.<n>It consistently achieves performance comparable to or better than existing approaches.
arXiv Detail & Related papers (2025-07-14T13:42:39Z)
PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs [80.08310253195144]
PRING is the first benchmark that evaluates protein-protein interaction prediction from a graph-level perspective.<n> PRING curates a high-quality, multi-species PPI network dataset comprising 21,484 proteins and 186,818 interactions.
arXiv Detail & Related papers (2025-07-07T15:21:05Z)
OmniESI: A unified framework for enzyme-substrate interaction prediction with progressive conditional deep learning [46.402707495664174]
We introduce a two-stage progressive framework, OmniESI, for enzyme-substrate interaction prediction through conditional deep learning.<n>We show that OmniESI consistently delivered superior performance than state-of-the-art specialized methods.<n>Overall, OmniESI represents a unified predictive approach for enzyme-substrate interactions.
arXiv Detail & Related papers (2025-06-22T09:40:40Z)
A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery [32.573496601865465]
Structure-based drug discovery (SBDD) is a systematic scientific process that develops new drugs by leveraging the detailed physical structure of the target protein. Recent advancements in pre-trained models for biomolecules have demonstrated remarkable success across various biochemical applications.
arXiv Detail & Related papers (2025-03-06T12:04:56Z)
Chemical knowledge-informed framework for privacy-aware retrosynthesis learning [60.93245342663455]
Current machine learning-based retrosynthesis gathers reaction data from multiple sources into one single edge to train prediction models. This paradigm poses considerable privacy risks as it necessitates broad data availability across organizational boundaries. In the present study, we introduce the chemical knowledge-informed framework (CKIF), a privacy-preserving approach for learning retrosynthesis models.
arXiv Detail & Related papers (2025-02-26T13:13:24Z)
Inductive-Associative Meta-learning Pipeline with Human Cognitive Patterns for Unseen Drug-Target Interaction Prediction [13.23471591766483]
BioBridge predicts novel drug-target interactions using limited sequence data. It incorporates multi-level encoders with adversarial training to accumulate transferable binding principles. It proves effective for virtual screening of the epidermal growth factor receptor and adenosine receptor, underscoring its potential in drug discovery.
arXiv Detail & Related papers (2025-01-26T08:22:22Z)
EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary Dynamics [51.47520281819253]
Enzyme design is a critical area in biotechnology, with applications ranging from drug development to synthetic biology. Traditional methods for enzyme function prediction or protein binding pocket design often fall short in capturing the dynamic and complex nature of enzyme-substrate interactions. We introduce EnzymeFlow, a generative model that employs flow matching with hierarchical pre-training and enzyme-reaction co-evolution to generate catalytic pockets.
arXiv Detail & Related papers (2024-10-01T02:04:01Z)
Binding Affinity Prediction: From Conventional to Machine Learning-Based Approaches [48.66541987908136]
Much work has been devoted to predicting binding affinity over the past decades.<n>We note growing use of both traditional machine learning and deep learning models for predicting binding affinity.<n>With improved predictive performance and the FDA's phasing out of animal testing, AI-driven in silico models, such as AI virtual cells (AIVCs), are poised to advance binding affinity prediction.
arXiv Detail & Related papers (2024-09-30T03:40:49Z)
UAlign: Pushing the Limit of Template-free Retrosynthesis Prediction with Unsupervised SMILES Alignment [51.49238426241974]
This paper introduces UAlign, a template-free graph-to-sequence pipeline for retrosynthesis prediction. By combining graph neural networks and Transformers, our method can more effectively leverage the inherent graph structure of molecules.
arXiv Detail & Related papers (2024-03-25T03:23:03Z)
Substrate Scope Contrastive Learning: Repurposing Human Bias to Learn Atomic Representations [14.528429119352328]
We introduce a novel pre-training strategy, substrate scope contrastive learning, which learns atomic representations tailored to chemical reactivity. We focus on 20,798 aryl halides in the CAS Content Collection spanning thousands of publications to learn a representation of aryl halide reactivity. This work not only presents a chemistry-tailored neural network pre-training strategy to learn reactivity-aligned atomic representations, but also marks a first-of-its-kind approach to benefit from the human bias in substrate scope design.
arXiv Detail & Related papers (2024-02-19T02:21:20Z)
Learning to Denoise Biomedical Knowledge Graph for Robust Molecular Interaction Prediction [50.7901190642594]
We propose BioKDN (Biomedical Knowledge Graph Denoising Network) for robust molecular interaction prediction. BioKDN refines the reliable structure of local subgraphs by denoising noisy links in a learnable manner. It maintains consistent and robust semantics by smoothing relations around the target interaction.
arXiv Detail & Related papers (2023-12-09T07:08:00Z)
Drug Synergistic Combinations Predictions via Large-Scale Pre-Training and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation. Deep learning models have emerged as an efficient way to discover synergistic combinations. Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z)
A biologically-inspired evaluation of molecular generative machine learning [17.623886600638716]
A novel biologically-inspired benchmark for the evaluation of molecular generative models is proposed. We propose a recreation metric, apply drug-target affinity prediction and molecular docking as complementary techniques for the evaluation of generative outputs.
arXiv Detail & Related papers (2022-08-20T11:01:10Z)
A knowledge graph representation learning approach to predict novel kinase-substrate interactions [0.0]
We present a knowledge graph representation learning approach to predict novel interaction partners for understudied kinases. Our approach uses a phosphoproteomic knowledge graph constructed by integrating data from iPTMnet, Protein Ontology, Gene Ontology and BioKG. We also present a post-predictive analysis of the predicted interactions and an ablation study of the phosphoproteomic knowledge graph to gain an insight into the biology of the understudied kinases.
arXiv Detail & Related papers (2022-06-05T23:55:40Z)
Improved Drug-target Interaction Prediction with Intermolecular Graph Transformer [98.8319016075089]
We propose a novel approach to model intermolecular information with a three-way Transformer-based architecture. Intermolecular Graph Transformer (IGT) outperforms state-of-the-art approaches by 9.1% and 20.5% over the second best for binding activity and binding pose prediction respectively. IGT exhibits promising drug screening ability against SARS-CoV-2 by identifying 83.1% active drugs that have been validated by wet-lab experiments with near-native predicted binding poses.
arXiv Detail & Related papers (2021-10-14T13:28:02Z)
Multi-View Self-Attention for Interpretable Drug-Target Interaction Prediction [4.307720252429733]
In machine learning approaches, the numerical representation of molecules is critical to the performance of the model. We propose a self-attention-based multi-view representation learning approach for modeling drug-target interactions.
arXiv Detail & Related papers (2020-05-01T14:28:17Z)
CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models [74.58583689523999]
We propose an end-to-end framework, named CogMol, for designing new drug-like small molecules targeting novel viral proteins. CogMol combines adaptive pre-training of a molecular SMILES Variational Autoencoder (VAE) and an efficient multi-attribute controlled sampling scheme. CogMol handles multi-constraint design of synthesizable, low-toxic, drug-like molecules with high target specificity and selectivity.
arXiv Detail & Related papers (2020-04-02T18:17:20Z)
Enzyme promiscuity prediction using hierarchy-informed multi-label classification [6.6828647808002595]
We present and evaluate machine-learning models to predict which of 983 distinct enzymes are likely to interact with a query molecule. Some interactions are attributed to natural selection and involve the enzyme's natural substrates. The majority of the interactions however involve non-natural substrates, thus reflecting promiscuous enzymatic activities.
arXiv Detail & Related papers (2020-02-18T01:39:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.