Assigning Confidence to Molecular Property Prediction
- URL: http://arxiv.org/abs/2102.11439v1
- Date: Tue, 23 Feb 2021 01:03:48 GMT
- Title: Assigning Confidence to Molecular Property Prediction
- Authors: AkshatKumar Nigam, Robert Pollice, Matthew F. D. Hurley, Riley J.
Hickman, Matteo Aldeghi, Naruki Yoshikawa, Seyone Chithrananda, Vincent A.
Voelz, Al\'an Aspuru-Guzik
- Abstract summary: Machine learning has emerged as a powerful strategy to learn from existing datasets and perform predictions on unseen molecules.
We discuss popular strategies for predicting molecular properties relevant to drug design, their corresponding uncertainty sources and methods to quantify uncertainty and confidence.
- Score: 1.015785232738621
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Introduction: Computational modeling has rapidly advanced over the last
decades, especially to predict molecular properties for chemistry, material
science and drug design. Recently, machine learning techniques have emerged as
a powerful and cost-effective strategy to learn from existing datasets and
perform predictions on unseen molecules. Accordingly, the explosive rise of
data-driven techniques raises an important question: What confidence can be
assigned to molecular property predictions and what techniques can be used for
that purpose?
Areas covered: In this work, we discuss popular strategies for predicting
molecular properties relevant to drug design, their corresponding uncertainty
sources and methods to quantify uncertainty and confidence. First, our
considerations for assessing confidence begin with dataset bias and size,
data-driven property prediction and feature design. Next, we discuss property
simulation via molecular docking, and free-energy simulations of binding
affinity in detail. Lastly, we investigate how these uncertainties propagate to
generative models, as they are usually coupled with property predictors.
Expert opinion: Computational techniques are paramount to reduce the
prohibitive cost and timing of brute-force experimentation when exploring the
enormous chemical space. We believe that assessing uncertainty in property
prediction models is essential whenever closed-loop drug design campaigns
relying on high-throughput virtual screening are deployed. Accordingly,
considering sources of uncertainty leads to better-informed experimental
validations, more reliable predictions and to more realistic expectations of
the entire workflow. Overall, this increases confidence in the predictions and
designs and, ultimately, accelerates drug design.
Related papers
- Balancing Molecular Information and Empirical Data in the Prediction of Physico-Chemical Properties [8.649679686652648]
We propose a general method for combining molecular descriptors with representation learning.
The proposed hybrid model exploits chemical structure information using graph neural networks.
It automatically detects cases where structure-based predictions are unreliable, in which case it corrects them by representation-learning based predictions.
arXiv Detail & Related papers (2024-06-12T10:51:00Z) - Towards robust prediction of material properties for nuclear reactor design under scarce data -- a study in creep rupture property [7.068581430279433]
Key challenges include the availability of data set and insufficient consideration of the uncertainty in the data, model, and prediction.
This paper presents a meta-learning based approach that is both uncertainty- and prior knowledge-informed, aiming at trustful predictions of material properties for the nuclear reactor design.
arXiv Detail & Related papers (2024-05-28T06:20:14Z) - CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding [62.075029712357]
This work introduces the Cognitive Diffusion Probabilistic Models (CogDPM)
CogDPM features a precision estimation method based on the hierarchical sampling capabilities of diffusion models and weight the guidance with precision weights estimated by the inherent property of diffusion models.
We apply CogDPM to real-world prediction tasks using the United Kindom precipitation and surface wind datasets.
arXiv Detail & Related papers (2024-05-03T15:54:50Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Confidence and Dispersity Speak: Characterising Prediction Matrix for
Unsupervised Accuracy Estimation [51.809741427975105]
This work aims to assess how well a model performs under distribution shifts without using labels.
We use the nuclear norm that has been shown to be effective in characterizing both properties.
We show that the nuclear norm is more accurate and robust in accuracy than existing methods.
arXiv Detail & Related papers (2023-02-02T13:30:48Z) - Low cost prediction of probability distributions of molecular properties
for early virtual screening [0.8702432681310399]
This article applies Hierarchical Correlation Reconstruction approach, previously applied in the analysis of demographic, financial and astronomical data.
The whole methodology constitutes therefore a great support for medicinal chemists, as it enable fast rejection of compounds with the lowest potential of desired physicochemical/ADMET characteristic.
arXiv Detail & Related papers (2022-07-21T13:29:26Z) - What Should I Know? Using Meta-gradient Descent for Predictive Feature
Discovery in a Single Stream of Experience [63.75363908696257]
computational reinforcement learning seeks to construct an agent's perception of the world through predictions of future sensations.
An open challenge in this line of work is determining from the infinitely many predictions that the agent could possibly make which predictions might best support decision-making.
We introduce a meta-gradient descent process by which an agent learns what predictions to make, 2) the estimates for its chosen predictions, and 3) how to use those estimates to generate policies that maximize future reward.
arXiv Detail & Related papers (2022-06-13T21:31:06Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Calibrated Uncertainty for Molecular Property Prediction using Ensembles
of Message Passing Neural Networks [11.47132155400871]
We extend a message passing neural network designed specifically for predicting properties of molecules and materials.
We show that our approach results in accurate models for predicting molecular formation energies with calibrated uncertainty.
arXiv Detail & Related papers (2021-07-13T13:28:11Z) - Quantifying sources of uncertainty in drug discovery predictions with
probabilistic models [0.0]
Knowing the uncertainty in a prediction is critical when making expensive investment decisions and when patient safety is paramount.
Machine learning (ML) models in drug discovery typically provide only a single best estimate and ignore all sources of uncertainty.
Probabilistic predictive models (PPMs) can incorporate uncertainty in both the data and model, and return a distribution of predicted values.
arXiv Detail & Related papers (2021-05-18T18:54:54Z) - Explainable Deep Relational Networks for Predicting Compound-Protein
Affinities and Contacts [80.69440684790925]
DeepRelations is a physics-inspired deep relational network with intrinsically explainable architecture.
It shows superior interpretability to the state-of-the-art.
It boosts the AUPRC of contact prediction 9.5, 16.9, 19.3 and 5.7-fold for the test, compound-unique, protein-unique, and both-unique sets.
arXiv Detail & Related papers (2019-12-29T00:14:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.