Machine learning modeling of family wide enzyme-substrate specificity
screens
- URL: http://arxiv.org/abs/2109.03900v1
- Date: Wed, 8 Sep 2021 19:44:42 GMT
- Title: Machine learning modeling of family wide enzyme-substrate specificity
screens
- Authors: Samuel Goldman, Ria Das, Kevin K. Yang, Connor W. Coley
- Abstract summary: Biocatalysis is a promising approach to synthesize pharmaceuticals, complex natural products, and commodity chemicals at scale.
The adoption of biocatalysis is limited by our ability to select enzymes that will catalyze their natural chemical transformation on non-natural substrates.
- Score: 2.276367922551686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Biocatalysis is a promising approach to sustainably synthesize
pharmaceuticals, complex natural products, and commodity chemicals at scale.
However, the adoption of biocatalysis is limited by our ability to select
enzymes that will catalyze their natural chemical transformation on non-natural
substrates. While machine learning and in silico directed evolution are
well-posed for this predictive modeling challenge, efforts to date have
primarily aimed to increase activity against a single known substrate, rather
than to identify enzymes capable of acting on new substrates of interest. To
address this need, we curate 6 different high-quality enzyme family screens
from the literature that each measure multiple enzymes against multiple
substrates. We compare machine learning-based compound-protein interaction
(CPI) modeling approaches from the literature used for predicting drug-target
interactions. Surprisingly, comparing these interaction-based models against
collections of independent (single task) enzyme-only or substrate-only models
reveals that current CPI approaches are incapable of learning interactions
between compounds and proteins in the current family level data regime. We
further validate this observation by demonstrating that our no-interaction
baseline can outperform CPI-based models from the literature used to guide the
discovery of kinase inhibitors. Given the high performance of non-interaction
based models, we introduce a new structure-based strategy for pooling residue
representations across a protein sequence. Altogether, this work motivates a
principled path forward in order to build and evaluate meaningful predictive
models for biocatalysis and other drug discovery applications.
Related papers
- EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary Dynamics [51.47520281819253]
Enzyme design is a critical area in biotechnology, with applications ranging from drug development to synthetic biology.
Traditional methods for enzyme function prediction or protein binding pocket design often fall short in capturing the dynamic and complex nature of enzyme-substrate interactions.
We introduce EnzymeFlow, a generative model that employs flow matching with hierarchical pre-training and enzyme-reaction co-evolution to generate catalytic pockets.
arXiv Detail & Related papers (2024-10-01T02:04:01Z) - UAlign: Pushing the Limit of Template-free Retrosynthesis Prediction with Unsupervised SMILES Alignment [51.49238426241974]
This paper introduces UAlign, a template-free graph-to-sequence pipeline for retrosynthesis prediction.
By combining graph neural networks and Transformers, our method can more effectively leverage the inherent graph structure of molecules.
arXiv Detail & Related papers (2024-03-25T03:23:03Z) - Substrate Scope Contrastive Learning: Repurposing Human Bias to Learn
Atomic Representations [14.528429119352328]
We introduce a novel pre-training strategy, substrate scope contrastive learning, which learns atomic representations tailored to chemical reactivity.
We focus on 20,798 aryl halides in the CAS Content Collection spanning thousands of publications to learn a representation of aryl halide reactivity.
This work not only presents a chemistry-tailored neural network pre-training strategy to learn reactivity-aligned atomic representations, but also marks a first-of-its-kind approach to benefit from the human bias in substrate scope design.
arXiv Detail & Related papers (2024-02-19T02:21:20Z) - Learning to Denoise Biomedical Knowledge Graph for Robust Molecular Interaction Prediction [50.7901190642594]
We propose BioKDN (Biomedical Knowledge Graph Denoising Network) for robust molecular interaction prediction.
BioKDN refines the reliable structure of local subgraphs by denoising noisy links in a learnable manner.
It maintains consistent and robust semantics by smoothing relations around the target interaction.
arXiv Detail & Related papers (2023-12-09T07:08:00Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - A biologically-inspired evaluation of molecular generative machine
learning [17.623886600638716]
A novel biologically-inspired benchmark for the evaluation of molecular generative models is proposed.
We propose a recreation metric, apply drug-target affinity prediction and molecular docking as complementary techniques for the evaluation of generative outputs.
arXiv Detail & Related papers (2022-08-20T11:01:10Z) - A knowledge graph representation learning approach to predict novel
kinase-substrate interactions [0.0]
We present a knowledge graph representation learning approach to predict novel interaction partners for understudied kinases.
Our approach uses a phosphoproteomic knowledge graph constructed by integrating data from iPTMnet, Protein Ontology, Gene Ontology and BioKG.
We also present a post-predictive analysis of the predicted interactions and an ablation study of the phosphoproteomic knowledge graph to gain an insight into the biology of the understudied kinases.
arXiv Detail & Related papers (2022-06-05T23:55:40Z) - Improved Drug-target Interaction Prediction with Intermolecular Graph
Transformer [98.8319016075089]
We propose a novel approach to model intermolecular information with a three-way Transformer-based architecture.
Intermolecular Graph Transformer (IGT) outperforms state-of-the-art approaches by 9.1% and 20.5% over the second best for binding activity and binding pose prediction respectively.
IGT exhibits promising drug screening ability against SARS-CoV-2 by identifying 83.1% active drugs that have been validated by wet-lab experiments with near-native predicted binding poses.
arXiv Detail & Related papers (2021-10-14T13:28:02Z) - Multi-View Self-Attention for Interpretable Drug-Target Interaction
Prediction [4.307720252429733]
In machine learning approaches, the numerical representation of molecules is critical to the performance of the model.
We propose a self-attention-based multi-view representation learning approach for modeling drug-target interactions.
arXiv Detail & Related papers (2020-05-01T14:28:17Z) - CogMol: Target-Specific and Selective Drug Design for COVID-19 Using
Deep Generative Models [74.58583689523999]
We propose an end-to-end framework, named CogMol, for designing new drug-like small molecules targeting novel viral proteins.
CogMol combines adaptive pre-training of a molecular SMILES Variational Autoencoder (VAE) and an efficient multi-attribute controlled sampling scheme.
CogMol handles multi-constraint design of synthesizable, low-toxic, drug-like molecules with high target specificity and selectivity.
arXiv Detail & Related papers (2020-04-02T18:17:20Z) - Enzyme promiscuity prediction using hierarchy-informed multi-label
classification [6.6828647808002595]
We present and evaluate machine-learning models to predict which of 983 distinct enzymes are likely to interact with a query molecule.
Some interactions are attributed to natural selection and involve the enzyme's natural substrates.
The majority of the interactions however involve non-natural substrates, thus reflecting promiscuous enzymatic activities.
arXiv Detail & Related papers (2020-02-18T01:39:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.