Compound virtual screening by learning-to-rank with gradient boosting
decision tree and enrichment-based cumulative gain
- URL: http://arxiv.org/abs/2205.02169v1
- Date: Wed, 4 May 2022 16:36:24 GMT
- Title: Compound virtual screening by learning-to-rank with gradient boosting
decision tree and enrichment-based cumulative gain
- Authors: Kairi Furui, Masahito Ohue
- Abstract summary: gradient boosting decision tree (GBDT)-based learning-to-rank methods have gained popularity recently.
Normalized Enrichment Discounted Cumulative Gain (NEDCG) aims to properly evaluate the goodness of ranking predictions.
NEDCG showed that predictions by regression were comparable to random predictions in multi-assay, multi-family datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning-to-rank, a machine learning technique widely used in information
retrieval, has recently been applied to the problem of ligand-based virtual
screening, to accelerate the early stages of new drug development. Ranking
prediction models learn based on ordinal relationships, making them suitable
for integrating assay data from various environments. Existing studies of rank
prediction in compound screening have generally used a learning-to-rank method
called RankSVM. However, they have not been compared with or validated against
the gradient boosting decision tree (GBDT)-based learning-to-rank methods that
have gained popularity recently. Furthermore, although the ranking metric
called Normalized Discounted Cumulative Gain (NDCG) is widely used in
information retrieval, it only determines whether the predictions are better
than those of other models. In other words, NDCG is incapable of recognizing
when a prediction model produces worse than random results. Nevertheless, NDCG
is still used in the performance evaluation of compound screening using
learning-to-rank. This study used the GBDT model with ranking loss functions,
called lambdarank and lambdaloss, for ligand-based virtual screening; results
were compared with existing RankSVM methods and GBDT models using regression.
We also proposed a new ranking metric, Normalized Enrichment Discounted
Cumulative Gain (NEDCG), which aims to properly evaluate the goodness of
ranking predictions. Results showed that the GBDT model with learning-to-rank
outperformed existing regression methods using GBDT and RankSVM on diverse
datasets. Moreover, NEDCG showed that predictions by regression were comparable
to random predictions in multi-assay, multi-family datasets, demonstrating its
usefulness for a more direct assessment of compound screening performance.
Related papers
- Towards Robust and Interpretable EMG-based Hand Gesture Recognition using Deep Metric Meta Learning [37.21211404608413]
We propose a shift to deep metric-based meta-learning in EMG PR to supervise the creation of meaningful and interpretable representations.
We derive a robust class proximity-based confidence estimator that leads to a better rejection of incorrect decisions.
arXiv Detail & Related papers (2024-04-17T23:37:50Z) - Normality Learning-based Graph Anomaly Detection via Multi-Scale
Contrastive Learning [61.57383634677747]
Graph anomaly detection (GAD) has attracted increasing attention in machine learning and data mining.
Here, we propose a normality learning-based GAD framework via multi-scale contrastive learning networks (NLGAD for abbreviation)
Notably, the proposed algorithm improves the detection performance (up to 5.89% AUC gain) compared with the state-of-the-art methods.
arXiv Detail & Related papers (2023-09-12T08:06:04Z) - Predictive change point detection for heterogeneous data [1.1720726814454114]
"Predict and Compare" is a change point detection framework assisted by a predictive machine learning model.
It outperforms online CPD routines in terms of false positive rate and out-of-control average run length.
The power of the method is demonstrated in a tribological case study.
arXiv Detail & Related papers (2023-05-11T07:59:18Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Enhancing Diffusion-Based Image Synthesis with Robust Classifier
Guidance [17.929524924008962]
In order to obtain class-conditional generation, it was suggested to guide the diffusion process by gradients from a time-dependent classifier.
While the idea is theoretically sound, deep learning-based classifiers are infamously susceptible to gradient-based adversarial attacks.
We utilize this observation by defining and training a time-dependent adversarially robust classifier and use it as guidance for a generative diffusion model.
arXiv Detail & Related papers (2022-08-18T06:51:23Z) - The Concordance Index decomposition: A measure for a deeper
understanding of survival prediction models [3.186455928607442]
The Concordance Index (C-index) is a commonly used metric in Survival Analysis for evaluating the performance of a prediction model.
We propose a decomposition of the C-index into a weighted harmonic mean of two quantities: one for ranking observed events versus other observed events, and the other for ranking observed events versus censored cases.
arXiv Detail & Related papers (2022-02-28T23:50:47Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z) - Interpretable Learning-to-Rank with Generalized Additive Models [78.42800966500374]
Interpretability of learning-to-rank models is a crucial yet relatively under-examined research area.
Recent progress on interpretable ranking models largely focuses on generating post-hoc explanations for existing black-box ranking models.
We lay the groundwork for intrinsically interpretable learning-to-rank by introducing generalized additive models (GAMs) into ranking tasks.
arXiv Detail & Related papers (2020-05-06T01:51:30Z) - Generative Data Augmentation for Commonsense Reasoning [75.26876609249197]
G-DAUGC is a novel generative data augmentation method that aims to achieve more accurate and robust learning in the low-resource setting.
G-DAUGC consistently outperforms existing data augmentation methods based on back-translation.
Our analysis demonstrates that G-DAUGC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.
arXiv Detail & Related papers (2020-04-24T06:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.