Development and Evaluation of Conformal Prediction Methods for QSAR
- URL: http://arxiv.org/abs/2304.00970v1
- Date: Mon, 3 Apr 2023 13:41:09 GMT
- Title: Development and Evaluation of Conformal Prediction Methods for QSAR
- Authors: Yuting Xu, Andy Liaw, Robert P. Sheridan, Vladimir Svetnik
- Abstract summary: The quantitative structure-activity relationship (QSAR) regression model is a commonly used technique for predicting biological activities of compounds.
Most machine learning (ML) algorithms that achieve superior predictive performance require some add-on methods for estimating uncertainty of their prediction.
Conformal prediction (CP) is a promising approach. It is agnostic to the prediction algorithm and can produce valid prediction intervals under some weak assumptions on the data distribution.
- Score: 0.5161531917413706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The quantitative structure-activity relationship (QSAR) regression model is a
commonly used technique for predicting biological activities of compounds using
their molecular descriptors. Predictions from QSAR models can help, for
example, to optimize molecular structure; prioritize compounds for further
experimental testing; and estimate their toxicity. In addition to the accurate
estimation of the activity, it is highly desirable to obtain some estimate of
the uncertainty associated with the prediction, e.g., calculate a prediction
interval (PI) containing the true molecular activity with a pre-specified
probability, say 70%, 90% or 95%. The challenge is that most machine learning
(ML) algorithms that achieve superior predictive performance require some
add-on methods for estimating uncertainty of their prediction. The development
of these algorithms is an active area of research by statistical and ML
communities but their implementation for QSAR modeling remains limited.
Conformal prediction (CP) is a promising approach. It is agnostic to the
prediction algorithm and can produce valid prediction intervals under some weak
assumptions on the data distribution. We proposed computationally efficient CP
algorithms tailored to the most advanced ML models, including Deep Neural
Networks and Gradient Boosting Machines. The validity and efficiency of
proposed conformal predictors are demonstrated on a diverse collection of QSAR
datasets as well as simulation studies.
Related papers
- Ensemble Prediction via Covariate-dependent Stacking [0.0]
This study proposes a novel approach to ensemble prediction, called co-dependent stacking'' (CDST)
Unlike traditional stacking methods, CDST allows model weights to vary flexibly as a function of covariates, thereby enhancing predictive performance in complex scenarios.
Our findings suggest that the CDST is especially valuable for, but not limited to,temporal-temporal prediction problems, offering a powerful tool for researchers and practitioners in various data analysis fields.
arXiv Detail & Related papers (2024-08-19T07:31:31Z) - Ranking and Combining Latent Structured Predictive Scores without Labeled Data [2.5064967708371553]
This paper introduces a novel structured unsupervised ensemble learning model (SUEL)
It exploits the dependency between a set of predictors with continuous predictive scores, rank the predictors without labeled data and combine them to an ensembled score with weights.
The efficacy of the proposed methods is rigorously assessed through both simulation studies and real-world application of risk genes discovery.
arXiv Detail & Related papers (2024-08-14T20:14:42Z) - Achieving Well-Informed Decision-Making in Drug Discovery: A Comprehensive Calibration Study using Neural Network-Based Structure-Activity Models [4.619907534483781]
computational models that predict drug-target interactions are valuable tools to accelerate the development of new therapeutic agents.
However, such models can be poorly calibrated, which results in unreliable uncertainty estimates.
We show that combining post hoc calibration method with well-performing uncertainty quantification approaches can boost model accuracy and calibration.
arXiv Detail & Related papers (2024-07-19T10:29:00Z) - CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding [62.075029712357]
This work introduces the Cognitive Diffusion Probabilistic Models (CogDPM)
CogDPM features a precision estimation method based on the hierarchical sampling capabilities of diffusion models and weight the guidance with precision weights estimated by the inherent property of diffusion models.
We apply CogDPM to real-world prediction tasks using the United Kindom precipitation and surface wind datasets.
arXiv Detail & Related papers (2024-05-03T15:54:50Z) - Prediction-Powered Inference [68.97619568620709]
Prediction-powered inference is a framework for performing valid statistical inference when an experimental dataset is supplemented with predictions from a machine-learning system.
The framework yields simple algorithms for computing provably valid confidence intervals for quantities such as means, quantiles, and linear and logistic regression coefficients.
Prediction-powered inference could enable researchers to draw valid and more data-efficient conclusions using machine learning.
arXiv Detail & Related papers (2023-01-23T18:59:28Z) - Low cost prediction of probability distributions of molecular properties
for early virtual screening [0.8702432681310399]
This article applies Hierarchical Correlation Reconstruction approach, previously applied in the analysis of demographic, financial and astronomical data.
The whole methodology constitutes therefore a great support for medicinal chemists, as it enable fast rejection of compounds with the lowest potential of desired physicochemical/ADMET characteristic.
arXiv Detail & Related papers (2022-07-21T13:29:26Z) - Scalable computation of prediction intervals for neural networks via
matrix sketching [79.44177623781043]
Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure.
This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals.
arXiv Detail & Related papers (2022-05-06T13:18:31Z) - Dense Uncertainty Estimation [62.23555922631451]
In this paper, we investigate neural networks and uncertainty estimation techniques to achieve both accurate deterministic prediction and reliable uncertainty estimation.
We work on two types of uncertainty estimations solutions, namely ensemble based methods and generative model based methods, and explain their pros and cons while using them in fully/semi/weakly-supervised framework.
arXiv Detail & Related papers (2021-10-13T01:23:48Z) - When in Doubt: Neural Non-Parametric Uncertainty Quantification for
Epidemic Forecasting [70.54920804222031]
Most existing forecasting models disregard uncertainty quantification, resulting in mis-calibrated predictions.
Recent works in deep neural models for uncertainty-aware time-series forecasting also have several limitations.
We model the forecasting task as a probabilistic generative process and propose a functional neural process model called EPIFNP.
arXiv Detail & Related papers (2021-06-07T18:31:47Z) - Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic
Regression [51.770998056563094]
Probabilistic Gradient Boosting Machines (PGBM) is a method to create probabilistic predictions with a single ensemble of decision trees.
We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-03T08:32:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.