Machine learning modeling of family wide enzyme-substrate specificity
screens
- URL: http://arxiv.org/abs/2109.03900v1
- Date: Wed, 8 Sep 2021 19:44:42 GMT
- Title: Machine learning modeling of family wide enzyme-substrate specificity
screens
- Authors: Samuel Goldman, Ria Das, Kevin K. Yang, Connor W. Coley
- Abstract summary: Biocatalysis is a promising approach to synthesize pharmaceuticals, complex natural products, and commodity chemicals at scale.
The adoption of biocatalysis is limited by our ability to select enzymes that will catalyze their natural chemical transformation on non-natural substrates.
- Score: 2.276367922551686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Biocatalysis is a promising approach to sustainably synthesize
pharmaceuticals, complex natural products, and commodity chemicals at scale.
However, the adoption of biocatalysis is limited by our ability to select
enzymes that will catalyze their natural chemical transformation on non-natural
substrates. While machine learning and in silico directed evolution are
well-posed for this predictive modeling challenge, efforts to date have
primarily aimed to increase activity against a single known substrate, rather
than to identify enzymes capable of acting on new substrates of interest. To
address this need, we curate 6 different high-quality enzyme family screens
from the literature that each measure multiple enzymes against multiple
substrates. We compare machine learning-based compound-protein interaction
(CPI) modeling approaches from the literature used for predicting drug-target
interactions. Surprisingly, comparing these interaction-based models against
collections of independent (single task) enzyme-only or substrate-only models
reveals that current CPI approaches are incapable of learning interactions
between compounds and proteins in the current family level data regime. We
further validate this observation by demonstrating that our no-interaction
baseline can outperform CPI-based models from the literature used to guide the
discovery of kinase inhibitors. Given the high performance of non-interaction
based models, we introduce a new structure-based strategy for pooling residue
representations across a protein sequence. Altogether, this work motivates a
principled path forward in order to build and evaluate meaningful predictive
models for biocatalysis and other drug discovery applications.
Related papers
- Docking-Aware Attention: Dynamic Protein Representations through Molecular Context Integration [22.154465616964263]
We present Docking-Aware Attention (DAA), a novel architecture that generates dynamic, context-dependent protein representations.
We evaluate our method on enzymatic reaction prediction, where it outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2025-02-03T15:52:38Z) - Leveraging Induced Transferable Binding Principles for Associative Prediction of Novel Drug-Target Interactions [13.23471591766483]
BioBridge predicts novel drug-target interactions using limited sequence data.
It incorporates multi-level encoders with adversarial training to accumulate transferable binding principles.
It proves effective for virtual screening of the epidermal growth factor receptor and adenosine receptor, underscoring its potential in drug discovery.
arXiv Detail & Related papers (2025-01-26T08:22:22Z) - MIN: Multi-channel Interaction Network for Drug-Target Interaction with Protein Distillation [64.4838301776267]
Multi-channel Interaction Network (MIN) is a novel framework designed to predict drug-target interaction (DTI)
MIN incorporates a representation learning module and a multi-channel interaction module.
MIN is not only a potent tool for DTI prediction but also offers fresh insights into the prediction of protein binding sites.
arXiv Detail & Related papers (2024-11-23T05:38:36Z) - EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary Dynamics [51.47520281819253]
Enzyme design is a critical area in biotechnology, with applications ranging from drug development to synthetic biology.
Traditional methods for enzyme function prediction or protein binding pocket design often fall short in capturing the dynamic and complex nature of enzyme-substrate interactions.
We introduce EnzymeFlow, a generative model that employs flow matching with hierarchical pre-training and enzyme-reaction co-evolution to generate catalytic pockets.
arXiv Detail & Related papers (2024-10-01T02:04:01Z) - UAlign: Pushing the Limit of Template-free Retrosynthesis Prediction with Unsupervised SMILES Alignment [51.49238426241974]
This paper introduces UAlign, a template-free graph-to-sequence pipeline for retrosynthesis prediction.
By combining graph neural networks and Transformers, our method can more effectively leverage the inherent graph structure of molecules.
arXiv Detail & Related papers (2024-03-25T03:23:03Z) - Learning to Denoise Biomedical Knowledge Graph for Robust Molecular Interaction Prediction [50.7901190642594]
We propose BioKDN (Biomedical Knowledge Graph Denoising Network) for robust molecular interaction prediction.
BioKDN refines the reliable structure of local subgraphs by denoising noisy links in a learnable manner.
It maintains consistent and robust semantics by smoothing relations around the target interaction.
arXiv Detail & Related papers (2023-12-09T07:08:00Z) - A biologically-inspired evaluation of molecular generative machine
learning [17.623886600638716]
A novel biologically-inspired benchmark for the evaluation of molecular generative models is proposed.
We propose a recreation metric, apply drug-target affinity prediction and molecular docking as complementary techniques for the evaluation of generative outputs.
arXiv Detail & Related papers (2022-08-20T11:01:10Z) - Improved Drug-target Interaction Prediction with Intermolecular Graph
Transformer [98.8319016075089]
We propose a novel approach to model intermolecular information with a three-way Transformer-based architecture.
Intermolecular Graph Transformer (IGT) outperforms state-of-the-art approaches by 9.1% and 20.5% over the second best for binding activity and binding pose prediction respectively.
IGT exhibits promising drug screening ability against SARS-CoV-2 by identifying 83.1% active drugs that have been validated by wet-lab experiments with near-native predicted binding poses.
arXiv Detail & Related papers (2021-10-14T13:28:02Z) - Multi-View Self-Attention for Interpretable Drug-Target Interaction
Prediction [4.307720252429733]
In machine learning approaches, the numerical representation of molecules is critical to the performance of the model.
We propose a self-attention-based multi-view representation learning approach for modeling drug-target interactions.
arXiv Detail & Related papers (2020-05-01T14:28:17Z) - CogMol: Target-Specific and Selective Drug Design for COVID-19 Using
Deep Generative Models [74.58583689523999]
We propose an end-to-end framework, named CogMol, for designing new drug-like small molecules targeting novel viral proteins.
CogMol combines adaptive pre-training of a molecular SMILES Variational Autoencoder (VAE) and an efficient multi-attribute controlled sampling scheme.
CogMol handles multi-constraint design of synthesizable, low-toxic, drug-like molecules with high target specificity and selectivity.
arXiv Detail & Related papers (2020-04-02T18:17:20Z) - Enzyme promiscuity prediction using hierarchy-informed multi-label
classification [6.6828647808002595]
We present and evaluate machine-learning models to predict which of 983 distinct enzymes are likely to interact with a query molecule.
Some interactions are attributed to natural selection and involve the enzyme's natural substrates.
The majority of the interactions however involve non-natural substrates, thus reflecting promiscuous enzymatic activities.
arXiv Detail & Related papers (2020-02-18T01:39:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.