Related papers: Compound virtual screening by learning-to-rank with gradient boosting decision tree and enrichment-based cumulative gain

Compound virtual screening by learning-to-rank with gradient boosting decision tree and enrichment-based cumulative gain

URL: http://arxiv.org/abs/2205.02169v1
Date: Wed, 4 May 2022 16:36:24 GMT
Title: Compound virtual screening by learning-to-rank with gradient boosting decision tree and enrichment-based cumulative gain
Authors: Kairi Furui, Masahito Ohue
Abstract summary: gradient boosting decision tree (GBDT)-based learning-to-rank methods have gained popularity recently. Normalized Enrichment Discounted Cumulative Gain (NEDCG) aims to properly evaluate the goodness of ranking predictions. NEDCG showed that predictions by regression were comparable to random predictions in multi-assay, multi-family datasets.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning-to-rank, a machine learning technique widely used in information retrieval, has recently been applied to the problem of ligand-based virtual screening, to accelerate the early stages of new drug development. Ranking prediction models learn based on ordinal relationships, making them suitable for integrating assay data from various environments. Existing studies of rank prediction in compound screening have generally used a learning-to-rank method called RankSVM. However, they have not been compared with or validated against the gradient boosting decision tree (GBDT)-based learning-to-rank methods that have gained popularity recently. Furthermore, although the ranking metric called Normalized Discounted Cumulative Gain (NDCG) is widely used in information retrieval, it only determines whether the predictions are better than those of other models. In other words, NDCG is incapable of recognizing when a prediction model produces worse than random results. Nevertheless, NDCG is still used in the performance evaluation of compound screening using learning-to-rank. This study used the GBDT model with ranking loss functions, called lambdarank and lambdaloss, for ligand-based virtual screening; results were compared with existing RankSVM methods and GBDT models using regression. We also proposed a new ranking metric, Normalized Enrichment Discounted Cumulative Gain (NEDCG), which aims to properly evaluate the goodness of ranking predictions. Results showed that the GBDT model with learning-to-rank outperformed existing regression methods using GBDT and RankSVM on diverse datasets. Moreover, NEDCG showed that predictions by regression were comparable to random predictions in multi-assay, multi-family datasets, demonstrating its usefulness for a more direct assessment of compound screening performance.

Related papers

Deep evolving semi-supervised anomaly detection [14.027613461156864]
The aim of this paper is to formalise the task of continual semi-supervised anomaly detection (CSAD) The paper introduces a baseline model of a variational autoencoder (VAE) to work with semi-supervised data along with a continual learning method of deep generative replay with outlier rejection.
arXiv Detail & Related papers (2024-12-01T15:48:37Z)
Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching. We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy. Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z)
Towards Robust and Interpretable EMG-based Hand Gesture Recognition using Deep Metric Meta Learning [37.21211404608413]
We propose a shift to deep metric-based meta-learning in EMG PR to supervise the creation of meaningful and interpretable representations. We derive a robust class proximity-based confidence estimator that leads to a better rejection of incorrect decisions.
arXiv Detail & Related papers (2024-04-17T23:37:50Z)
Normality Learning-based Graph Anomaly Detection via Multi-Scale Contrastive Learning [61.57383634677747]
Graph anomaly detection (GAD) has attracted increasing attention in machine learning and data mining. Here, we propose a normality learning-based GAD framework via multi-scale contrastive learning networks (NLGAD for abbreviation) Notably, the proposed algorithm improves the detection performance (up to 5.89% AUC gain) compared with the state-of-the-art methods.
arXiv Detail & Related papers (2023-09-12T08:06:04Z)
Predictive change point detection for heterogeneous data [1.1720726814454114]
"Predict and Compare" is a change point detection framework assisted by a predictive machine learning model. It outperforms online CPD routines in terms of false positive rate and out-of-control average run length. The power of the method is demonstrated in a tribological case study.
arXiv Detail & Related papers (2023-05-11T07:59:18Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data. We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z)
Parametric Classification for Generalized Category Discovery: A Baseline Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples. We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem. We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z)
Enhancing Diffusion-Based Image Synthesis with Robust Classifier Guidance [17.929524924008962]
In order to obtain class-conditional generation, it was suggested to guide the diffusion process by gradients from a time-dependent classifier. While the idea is theoretically sound, deep learning-based classifiers are infamously susceptible to gradient-based adversarial attacks. We utilize this observation by defining and training a time-dependent adversarially robust classifier and use it as guidance for a generative diffusion model.
arXiv Detail & Related papers (2022-08-18T06:51:23Z)
The Concordance Index decomposition: A measure for a deeper understanding of survival prediction models [3.186455928607442]
The Concordance Index (C-index) is a commonly used metric in Survival Analysis for evaluating the performance of a prediction model. We propose a decomposition of the C-index into a weighted harmonic mean of two quantities: one for ranking observed events versus other observed events, and the other for ranking observed events versus censored cases.
arXiv Detail & Related papers (2022-02-28T23:50:47Z)
Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation. We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation. Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z)
Binary Classification of Gaussian Mixtures: Abundance of Support Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model. We derive novel non-asymptotic bounds on the classification error of the latter. Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z)
Interpretable Learning-to-Rank with Generalized Additive Models [78.42800966500374]
Interpretability of learning-to-rank models is a crucial yet relatively under-examined research area. Recent progress on interpretable ranking models largely focuses on generating post-hoc explanations for existing black-box ranking models. We lay the groundwork for intrinsically interpretable learning-to-rank by introducing generalized additive models (GAMs) into ranking tasks.
arXiv Detail & Related papers (2020-05-06T01:51:30Z)
Generative Data Augmentation for Commonsense Reasoning [75.26876609249197]
G-DAUGC is a novel generative data augmentation method that aims to achieve more accurate and robust learning in the low-resource setting. G-DAUGC consistently outperforms existing data augmentation methods based on back-translation. Our analysis demonstrates that G-DAUGC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.
arXiv Detail & Related papers (2020-04-24T06:12:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.