Compound virtual screening by learning-to-rank with gradient boosting
decision tree and enrichment-based cumulative gain
- URL: http://arxiv.org/abs/2205.02169v1
- Date: Wed, 4 May 2022 16:36:24 GMT
- Title: Compound virtual screening by learning-to-rank with gradient boosting
decision tree and enrichment-based cumulative gain
- Authors: Kairi Furui, Masahito Ohue
- Abstract summary: gradient boosting decision tree (GBDT)-based learning-to-rank methods have gained popularity recently.
Normalized Enrichment Discounted Cumulative Gain (NEDCG) aims to properly evaluate the goodness of ranking predictions.
NEDCG showed that predictions by regression were comparable to random predictions in multi-assay, multi-family datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning-to-rank, a machine learning technique widely used in information
retrieval, has recently been applied to the problem of ligand-based virtual
screening, to accelerate the early stages of new drug development. Ranking
prediction models learn based on ordinal relationships, making them suitable
for integrating assay data from various environments. Existing studies of rank
prediction in compound screening have generally used a learning-to-rank method
called RankSVM. However, they have not been compared with or validated against
the gradient boosting decision tree (GBDT)-based learning-to-rank methods that
have gained popularity recently. Furthermore, although the ranking metric
called Normalized Discounted Cumulative Gain (NDCG) is widely used in
information retrieval, it only determines whether the predictions are better
than those of other models. In other words, NDCG is incapable of recognizing
when a prediction model produces worse than random results. Nevertheless, NDCG
is still used in the performance evaluation of compound screening using
learning-to-rank. This study used the GBDT model with ranking loss functions,
called lambdarank and lambdaloss, for ligand-based virtual screening; results
were compared with existing RankSVM methods and GBDT models using regression.
We also proposed a new ranking metric, Normalized Enrichment Discounted
Cumulative Gain (NEDCG), which aims to properly evaluate the goodness of
ranking predictions. Results showed that the GBDT model with learning-to-rank
outperformed existing regression methods using GBDT and RankSVM on diverse
datasets. Moreover, NEDCG showed that predictions by regression were comparable
to random predictions in multi-assay, multi-family datasets, demonstrating its
usefulness for a more direct assessment of compound screening performance.
Related papers
- Sequence Generation Modeling for Continuous Value Prediction [35.33333441236041]
Continuous value prediction (CVP) plays a crucial role in short video recommendation, capturing user preferences through precise numerical estimations.
We introduce a novel Generative Regression framework for CVP inspired by sequence generation techniques in language modeling.
Our method transforms numerical values into token sequences through structural discretization, preserving original data fidelity while improving prediction precision.
arXiv Detail & Related papers (2024-12-28T16:48:55Z) - Deep evolving semi-supervised anomaly detection [14.027613461156864]
The aim of this paper is to formalise the task of continual semi-supervised anomaly detection (CSAD)
The paper introduces a baseline model of a variational autoencoder (VAE) to work with semi-supervised data along with a continual learning method of deep generative replay with outlier rejection.
arXiv Detail & Related papers (2024-12-01T15:48:37Z) - Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.
We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.
Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z) - Towards Robust and Interpretable EMG-based Hand Gesture Recognition using Deep Metric Meta Learning [37.21211404608413]
We propose a shift to deep metric-based meta-learning in EMG PR to supervise the creation of meaningful and interpretable representations.
We derive a robust class proximity-based confidence estimator that leads to a better rejection of incorrect decisions.
arXiv Detail & Related papers (2024-04-17T23:37:50Z) - Normality Learning-based Graph Anomaly Detection via Multi-Scale
Contrastive Learning [61.57383634677747]
Graph anomaly detection (GAD) has attracted increasing attention in machine learning and data mining.
Here, we propose a normality learning-based GAD framework via multi-scale contrastive learning networks (NLGAD for abbreviation)
Notably, the proposed algorithm improves the detection performance (up to 5.89% AUC gain) compared with the state-of-the-art methods.
arXiv Detail & Related papers (2023-09-12T08:06:04Z) - Predictive change point detection for heterogeneous data [1.1720726814454114]
"Predict and Compare" is a change point detection framework assisted by a predictive machine learning model.
It outperforms online CPD routines in terms of false positive rate and out-of-control average run length.
The power of the method is demonstrated in a tribological case study.
arXiv Detail & Related papers (2023-05-11T07:59:18Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z) - Interpretable Learning-to-Rank with Generalized Additive Models [78.42800966500374]
Interpretability of learning-to-rank models is a crucial yet relatively under-examined research area.
Recent progress on interpretable ranking models largely focuses on generating post-hoc explanations for existing black-box ranking models.
We lay the groundwork for intrinsically interpretable learning-to-rank by introducing generalized additive models (GAMs) into ranking tasks.
arXiv Detail & Related papers (2020-05-06T01:51:30Z) - Generative Data Augmentation for Commonsense Reasoning [75.26876609249197]
G-DAUGC is a novel generative data augmentation method that aims to achieve more accurate and robust learning in the low-resource setting.
G-DAUGC consistently outperforms existing data augmentation methods based on back-translation.
Our analysis demonstrates that G-DAUGC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.
arXiv Detail & Related papers (2020-04-24T06:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.