Enzyme promiscuity prediction using hierarchy-informed multi-label
classification
- URL: http://arxiv.org/abs/2002.07327v2
- Date: Tue, 26 Jan 2021 03:01:52 GMT
- Title: Enzyme promiscuity prediction using hierarchy-informed multi-label
classification
- Authors: Gian Marco Visani, Michael C. Hughes, Soha Hassoun
- Abstract summary: We present and evaluate machine-learning models to predict which of 983 distinct enzymes are likely to interact with a query molecule.
Some interactions are attributed to natural selection and involve the enzyme's natural substrates.
The majority of the interactions however involve non-natural substrates, thus reflecting promiscuous enzymatic activities.
- Score: 6.6828647808002595
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: As experimental efforts are costly and time consuming, computational
characterization of enzyme capabilities is an attractive alternative. We
present and evaluate several machine-learning models to predict which of 983
distinct enzymes, as defined via the Enzyme Commission, EC, numbers, are likely
to interact with a given query molecule. Our data consists of enzyme-substrate
interactions from the BRENDA database. Some interactions are attributed to
natural selection and involve the enzyme's natural substrates. The majority of
the interactions however involve non-natural substrates, thus reflecting
promiscuous enzymatic activities. We frame this enzyme promiscuity prediction
problem as a multi-label classification task. We maximally utilize inhibitor
and unlabelled data to train prediction models that can take advantage of known
hierarchical relationships between enzyme classes. We report that a
hierarchical multi-label neural network, EPP-HMCNF, is the best model for
solving this problem, outperforming k-nearest neighbors similarity-based and
other machine learning models. We show that inhibitor information during
training consistently improves predictive power, particularly for EPP-HMCNF. We
also show that all promiscuity prediction models perform worse under a
realistic data split when compared to a random data split, and when evaluating
performance on non-natural substrates compared to natural substrates. We
provide Python code for EPP-HMCNF and other models in a repository termed EPP
(Enzyme Promiscuity Prediction) at https://github.com/hassounlab/EPP.
Related papers
- Improving Bias Correction Standards by Quantifying its Effects on Treatment Outcomes [54.18828236350544]
Propensity score matching (PSM) addresses selection biases by selecting comparable populations for analysis.
Different matching methods can produce significantly different Average Treatment Effects (ATE) for the same task, even when meeting all validation criteria.
To address this issue, we introduce a novel metric, A2A, to reduce the number of valid matches.
arXiv Detail & Related papers (2024-07-20T12:42:24Z) - Extracting Protein-Protein Interactions (PPIs) from Biomedical
Literature using Attention-based Relational Context Information [5.456047952635665]
This work presents a unified, multi-source PPI corpora with vetted interaction definitions augmented by binary interaction type labels.
A Transformer-based deep learning method exploits entities' relational context information for relation representation to improve relation classification performance.
The model's performance is evaluated on four widely studied biomedical relation extraction datasets.
arXiv Detail & Related papers (2024-03-08T01:43:21Z) - Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals [91.59906995214209]
We propose a new evaluation method, Counterfactual Attentiveness Test (CAT)
CAT uses counterfactuals by replacing part of the input with its counterpart from a different example, expecting an attentive model to change its prediction.
We show that GPT3 becomes less attentive with an increased number of demonstrations, while its accuracy on the test data improves.
arXiv Detail & Related papers (2023-11-16T06:27:35Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - A Supervised Machine Learning Approach for Sequence Based
Protein-protein Interaction (PPI) Prediction [4.916874464940376]
Computational protein-protein interaction (PPI) prediction techniques can contribute greatly in reducing time, cost and false-positive interactions.
We have described our submitted solution with the results of the SeqPIP competition.
arXiv Detail & Related papers (2022-03-23T18:27:25Z) - Active Learning by Feature Mixing [52.16150629234465]
We propose a novel method for batch active learning called ALFA-Mix.
We identify unlabelled instances with sufficiently-distinct features by seeking inconsistencies in predictions.
We show that inconsistencies in these predictions help discovering features that the model is unable to recognise in the unlabelled instances.
arXiv Detail & Related papers (2022-03-14T12:20:54Z) - Machine learning modeling of family wide enzyme-substrate specificity
screens [2.276367922551686]
Biocatalysis is a promising approach to synthesize pharmaceuticals, complex natural products, and commodity chemicals at scale.
The adoption of biocatalysis is limited by our ability to select enzymes that will catalyze their natural chemical transformation on non-natural substrates.
arXiv Detail & Related papers (2021-09-08T19:44:42Z) - DebiasedDTA: Model Debiasing to Boost Drug -- Target Affinity Prediction [0.10499611180329804]
We present DebiasedDTA, the first model debiasing approach that avoids dataset biases in order to boost the affinity prediction on novel biomolecules.
The results show that DebiasedDTA can boost models while predicting the interactions between novel biomolecules.
The experiments also show that DebiasedDTA can augment the DTA prediction models of different input and model structures.
arXiv Detail & Related papers (2021-07-04T19:21:37Z) - Bayesian neural network with pretrained protein embedding enhances
prediction accuracy of drug-protein interaction [3.499870393443268]
Deep learning approaches can predict drug-protein interactions without trial-and-error by humans.
We propose two methods to construct a deep learning framework that exhibits superior performance with a small labeled dataset.
arXiv Detail & Related papers (2020-12-15T10:24:34Z) - Towards Discriminability and Diversity: Batch Nuclear-norm Maximization
under Label Insufficient Situations [154.51144248210338]
Batch Nuclear-norm Maximization (BNM) is proposed to boost the learning under label insufficient learning scenarios.
BNM outperforms competitors and works well with existing well-known methods.
arXiv Detail & Related papers (2020-03-27T05:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.