Low cost prediction of probability distributions of molecular properties
for early virtual screening
- URL: http://arxiv.org/abs/2207.11174v1
- Date: Thu, 21 Jul 2022 13:29:26 GMT
- Title: Low cost prediction of probability distributions of molecular properties
for early virtual screening
- Authors: Jarek Duda, Sabina Podlewska
- Abstract summary: This article applies Hierarchical Correlation Reconstruction approach, previously applied in the analysis of demographic, financial and astronomical data.
The whole methodology constitutes therefore a great support for medicinal chemists, as it enable fast rejection of compounds with the lowest potential of desired physicochemical/ADMET characteristic.
- Score: 0.8702432681310399
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While there is a general focus on predictions of values, mathematically more
appropriate is prediction of probability distributions: with additional
possibilities like prediction of uncertainty, higher moments and quantiles. For
the purpose of the computer-aided drug design field, this article applies
Hierarchical Correlation Reconstruction approach, previously applied in the
analysis of demographic, financial and astronomical data. Instead of a single
linear regression to predict values, it uses multiple linear regressions to
independently predict multiple moments, finally combining them into predicted
probability distribution, here of several ADMET properties based on
substructural fingerprint developed by Klekota\&Roth. Discussed application
example is inexpensive selection of a percentage of molecules with properties
nearly certain to be in a predicted or chosen range during virtual screening.
Such an approach can facilitate the interpretation of the results as the
predictions characterized by high rate of uncertainty are automatically
detected. In addition, for each of the investigated predictive problems, we
detected crucial structural features, which should be carefully considered when
optimizing compounds towards particular property. The whole methodology
developed in the study constitutes therefore a great support for medicinal
chemists, as it enable fast rejection of compounds with the lowest potential of
desired physicochemical/ADMET characteristic and guides the compound
optimization process.
Related papers
- Optimizing Probabilistic Conformal Prediction with Vectorized Non-Conformity Scores [6.059745771017814]
We propose a novel framework that enhances efficiency by first vectorizing the non-conformity scores with ranked samples and then optimizing the shape of the prediction set by varying the quantiles for samples at the same rank.
Our method delivers valid coverage while producing discontinuous and more efficient prediction sets, making it particularly suited for high-stakes applications.
arXiv Detail & Related papers (2024-10-17T16:37:03Z) - Balancing Molecular Information and Empirical Data in the Prediction of Physico-Chemical Properties [8.649679686652648]
We propose a general method for combining molecular descriptors with representation learning.
The proposed hybrid model exploits chemical structure information using graph neural networks.
It automatically detects cases where structure-based predictions are unreliable, in which case it corrects them by representation-learning based predictions.
arXiv Detail & Related papers (2024-06-12T10:51:00Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Development and Evaluation of Conformal Prediction Methods for QSAR [0.5161531917413706]
The quantitative structure-activity relationship (QSAR) regression model is a commonly used technique for predicting biological activities of compounds.
Most machine learning (ML) algorithms that achieve superior predictive performance require some add-on methods for estimating uncertainty of their prediction.
Conformal prediction (CP) is a promising approach. It is agnostic to the prediction algorithm and can produce valid prediction intervals under some weak assumptions on the data distribution.
arXiv Detail & Related papers (2023-04-03T13:41:09Z) - TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance.
We propose a versatile method that estimates joint distributions using an attention-based decoder.
We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z) - Calibrated Uncertainty for Molecular Property Prediction using Ensembles
of Message Passing Neural Networks [11.47132155400871]
We extend a message passing neural network designed specifically for predicting properties of molecules and materials.
We show that our approach results in accurate models for predicting molecular formation energies with calibrated uncertainty.
arXiv Detail & Related papers (2021-07-13T13:28:11Z) - Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic
Regression [51.770998056563094]
Probabilistic Gradient Boosting Machines (PGBM) is a method to create probabilistic predictions with a single ensemble of decision trees.
We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-03T08:32:13Z) - Prediction in latent factor regression: Adaptive PCR and beyond [2.9439848714137447]
We prove a master theorem that establishes a risk bound for a large class of predictors.
We use our main theorem to recover known risk bounds for the minimum-norm interpolating predictor.
We conclude with a detailed simulation study to support and complement our theoretical results.
arXiv Detail & Related papers (2020-07-20T12:42:47Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Invariant Rationalization [84.1861516092232]
A typical rationalization criterion, i.e. maximum mutual information (MMI), finds the rationale that maximizes the prediction performance based only on the rationale.
We introduce a game-theoretic invariant rationalization criterion where the rationales are constrained to enable the same predictor to be optimal across different environments.
We show both theoretically and empirically that the proposed rationales can rule out spurious correlations, generalize better to different test scenarios, and align better with human judgments.
arXiv Detail & Related papers (2020-03-22T00:50:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.