Insights into performance evaluation of com-pound-protein interaction
prediction methods
- URL: http://arxiv.org/abs/2202.00001v1
- Date: Fri, 28 Jan 2022 20:07:19 GMT
- Title: Insights into performance evaluation of com-pound-protein interaction
prediction methods
- Authors: Adiba Yaseen (1), Imran Amin (2), Naeem Akhter (1), Asa Ben-Hur (3)
and Fayyaz Minhas (4) ((1) Department of Computer and Information Sciences
(DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS),
Islamabad, Pakistan,(2) National Institute for Biotechnology and Genetic
Engineering, Faisalabad, Pakistan,(3) Department of Computer Science,
Colorado State University, Fort Collins, USA (4) Tissue Image Analytics
Centre, Department of Computer Science, University of Warwick, Coven-try, UK)
- Abstract summary: Machine learning based prediction of compound-protein interactions (CPIs) is important for drug design, screening and repurposing studies.
We have observed a number of fundamental issues in experiment design that lead to over optimistic estimates of model performance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Motivation: Machine learning based prediction of compound-protein
interactions (CPIs) is important for drug design, screening and repurposing
studies and can improve the efficiency and cost-effectiveness of wet lab
assays. Despite the publication of many research papers reporting CPI
predictors in the recent years, we have observed a number of fundamental issues
in experiment design that lead to over optimistic estimates of model
performance. Results: In this paper, we analyze the impact of several important
factors affecting generalization perfor-mance of CPI predictors that are
overlooked in existing work: 1. Similarity between training and test examples
in cross-validation 2. The strategy for generating negative examples, in the
absence of experimentally verified negative examples. 3. Choice of evaluation
protocols and performance metrics and their alignment with real-world use of
CPI predictors in screening large compound libraries. Using both an existing
state-of-the-art method (CPI-NN) and a proposed kernel based approach, we have
found that assessment of predictive performance of CPI predictors requires
careful con-trol over similarity between training and test examples. We also
show that random pairing for gen-erating synthetic negative examples for
training and performance evaluation results in models with better
generalization performance in comparison to more sophisticated strategies used
in existing studies. Furthermore, we have found that our kernel based approach,
despite its simple design, exceeds the prediction performance of CPI-NN. We
have used the proposed model for compound screening of several proteins
including SARS-CoV-2 Spike and Human ACE2 proteins and found strong evidence in
support of its top hits. Availability: Code and raw experimental results
available at https://github.com/adibayaseen/HKRCPI Contact:
Fayyaz.minhas@warwick.ac.uk
Related papers
- C-XGBoost: A tree boosting model for causal effect estimation [8.246161706153805]
Causal effect estimation aims at estimating the Average Treatment Effect as well as the Conditional Average Treatment Effect of a treatment to an outcome from the available data.
We propose a new causal inference model, named C-XGBoost, for the prediction of potential outcomes.
arXiv Detail & Related papers (2024-03-31T17:43:37Z) - The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes [30.30769701138665]
We introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data.
Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem.
We introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point.
arXiv Detail & Related papers (2024-02-14T03:43:05Z) - Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals [91.59906995214209]
We propose a new evaluation method, Counterfactual Attentiveness Test (CAT)
CAT uses counterfactuals by replacing part of the input with its counterpart from a different example, expecting an attentive model to change its prediction.
We show that GPT3 becomes less attentive with an increased number of demonstrations, while its accuracy on the test data improves.
arXiv Detail & Related papers (2023-11-16T06:27:35Z) - On the Theories Behind Hard Negative Sampling for Recommendation [51.64626293229085]
We offer two insightful guidelines for effective usage of Hard Negative Sampling (HNS)
We prove that employing HNS on the Personalized Ranking (BPR) learner is equivalent to optimizing One-way Partial AUC (OPAUC)
These analyses establish the theoretical foundation of HNS in optimizing Top-K recommendation performance for the first time.
arXiv Detail & Related papers (2023-02-07T13:57:03Z) - Towards Robust Visual Question Answering: Making the Most of Biased
Samples via Contrastive Learning [54.61762276179205]
We propose a novel contrastive learning approach, MMBS, for building robust VQA models by Making the Most of Biased Samples.
Specifically, we construct positive samples for contrastive learning by eliminating the information related to spurious correlation from the original training samples.
We validate our contributions by achieving competitive performance on the OOD dataset VQA-CP v2 while preserving robust performance on the ID dataset VQA v2.
arXiv Detail & Related papers (2022-10-10T11:05:21Z) - A Supervised Machine Learning Approach for Sequence Based
Protein-protein Interaction (PPI) Prediction [4.916874464940376]
Computational protein-protein interaction (PPI) prediction techniques can contribute greatly in reducing time, cost and false-positive interactions.
We have described our submitted solution with the results of the SeqPIP competition.
arXiv Detail & Related papers (2022-03-23T18:27:25Z) - ACP++: Action Co-occurrence Priors for Human-Object Interaction
Detection [102.9428507180728]
A common problem in the task of human-object interaction (HOI) detection is that numerous HOI classes have only a small number of labeled examples.
We observe that there exist natural correlations and anti-correlations among human-object interactions.
We present techniques to learn these priors and leverage them for more effective training, especially on rare classes.
arXiv Detail & Related papers (2021-09-09T06:02:50Z) - Practical Assessment of Generalization Performance Robustness for Deep
Networks via Contrastive Examples [36.50563671470897]
Training images with data transformations have been suggested as contrastive examples to complement the testing set for generalization performance evaluation of deep neural networks (DNNs)
In this work, we propose a practical framework ContRE that uses Contrastive examples for DNN geneRalization performance Estimation.
arXiv Detail & Related papers (2021-06-20T08:46:01Z) - Performance Evaluation of Adversarial Attacks: Discrepancies and
Solutions [51.8695223602729]
adversarial attack methods have been developed to challenge the robustness of machine learning models.
We propose a Piece-wise Sampling Curving (PSC) toolkit to effectively address the discrepancy.
PSC toolkit offers options for balancing the computational cost and evaluation effectiveness.
arXiv Detail & Related papers (2021-04-22T14:36:51Z) - Double Robust Representation Learning for Counterfactual Prediction [68.78210173955001]
We propose a novel scalable method to learn double-robust representations for counterfactual predictions.
We make robust and efficient counterfactual predictions for both individual and average treatment effects.
The algorithm shows competitive performance with the state-of-the-art on real world and synthetic data.
arXiv Detail & Related papers (2020-10-15T16:39:26Z) - Fisher-Schultz Lecture: Generic Machine Learning Inference on
Heterogenous Treatment Effects in Randomized Experiments, with an Application
to Immunization in India [3.3449509626538543]
We propose strategies to estimate and make inference on key features of heterogeneous effects in randomized experiments.
Key features include best linear predictors of the effects using machine learning proxies, average effects sorted by impact groups, and average characteristics of most and least impacted units.
arXiv Detail & Related papers (2017-12-13T14:47:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.