Accurate, reliable and interpretable solubility prediction of druglike
molecules with attention pooling and Bayesian learning
- URL: http://arxiv.org/abs/2210.07145v1
- Date: Thu, 29 Sep 2022 07:48:10 GMT
- Title: Accurate, reliable and interpretable solubility prediction of druglike
molecules with attention pooling and Bayesian learning
- Authors: Seongok Ryu and Sumin Lee
- Abstract summary: In silico prediction of solubility has been studied for its utility in virtual screening and lead optimization.
Recently, machine learning (ML) methods using experimental data has been popular because physics-based methods are not suitable for high- throughput tasks.
In this paper, we develop graph neural networks (GNNs) with the self-attention readout layer to improve prediction performance.
- Score: 1.8275108630751844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In drug discovery, aqueous solubility is an important pharmacokinetic
property which affects absorption and assay availability of drug. Thus, in
silico prediction of solubility has been studied for its utility in virtual
screening and lead optimization. Recently, machine learning (ML) methods using
experimental data has been popular because physics-based methods like quantum
mechanics and molecular dynamics are not suitable for high-throughput tasks due
to its computational costs. However, ML method can exhibit over-fitting problem
in a data-deficient condition, and this is the case for most chemical property
datasets. In addition, ML methods are regarded as a black box function in that
it is difficult to interpret contribution of hidden features to outputs,
hindering analysis and modification of structure-activity relationship. To deal
with mentioned issues, we developed Bayesian graph neural networks (GNNs) with
the self-attention readout layer. Unlike most GNNs using self-attention in node
updates, self-attention applied at readout layer enabled a model to improve
prediction performance as well as to identify atom-wise importance, which can
help lead optimization as exemplified for three FDA-approved drugs. Also,
Bayesian inference enables us to separate more or less accurate results
according to uncertainty in solubility prediction task We expect that our
accurate, reliable and interpretable model can be used for more careful
decision-making and various applications in the development of drugs.
Related papers
- YZS-model: A Predictive Model for Organic Drug Solubility Based on Graph Convolutional Networks and Transformer-Attention [9.018408514318631]
Traditional methods often miss complex molecular structures, leading to inaccuracies.
We introduce the YZS-Model, a deep learning framework integrating Graph Convolutional Networks (GCN), Transformer architectures, and Long Short-Term Memory (LSTM) networks.
YZS-Model achieved an $R2$ of 0.59 and an RMSE of 0.57, outperforming benchmark models.
arXiv Detail & Related papers (2024-06-27T12:40:29Z) - Physical formula enhanced multi-task learning for pharmacokinetics prediction [54.13787789006417]
A major challenge for AI-driven drug discovery is the scarcity of high-quality data.
We develop a formula enhanced mul-ti-task learning (PEMAL) method that predicts four key parameters of pharmacokinetics simultaneously.
Our experiments reveal that PEMAL significantly lowers the data demand, compared to typical Graph Neural Networks.
arXiv Detail & Related papers (2024-04-16T07:42:55Z) - Machine Learning Small Molecule Properties in Drug Discovery [44.62264781248437]
We review a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity)
We discuss existing popular descriptors and embeddings, such as chemical fingerprints and graph-based neural networks.
Finally, techniques to provide an understanding of model predictions, especially for critical decision-making in drug discovery are assessed.
arXiv Detail & Related papers (2023-08-02T22:18:41Z) - On the Interplay of Subset Selection and Informed Graph Neural Networks [3.091456764812509]
This work focuses on predicting the molecules atomization energy in the QM9 dataset.
We show how maximizing molecular diversity in the training set selection process increases the robustness of linear and nonlinear regression techniques.
We also check the reliability of the predictions made by the graph neural network with a model-agnostic explainer.
arXiv Detail & Related papers (2023-06-15T09:09:27Z) - Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation [59.45669299295436]
We propose a Monte Carlo PDE solver for training unsupervised neural solvers.
We use the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles.
Our experiments on convection-diffusion, Allen-Cahn, and Navier-Stokes equations demonstrate significant improvements in accuracy and efficiency.
arXiv Detail & Related papers (2023-02-10T08:05:19Z) - Uncertainty quantification for predictions of atomistic neural networks [0.0]
This paper explores the value of uncertainty quantification on predictions for trained neural networks (NNs) on quantum chemical reference data.
The architecture of the PhysNet NN was suitably modified and the resulting model was evaluated with different metrics to quantify calibration, quality of predictions, and whether prediction error and the predicted uncertainty can be correlated.
arXiv Detail & Related papers (2022-07-14T13:39:43Z) - SSM-DTA: Breaking the Barriers of Data Scarcity in Drug-Target Affinity
Prediction [127.43571146741984]
Drug-Target Affinity (DTA) is of vital importance in early-stage drug discovery.
wet experiments remain the most reliable method, but they are time-consuming and resource-intensive.
Existing methods have primarily focused on developing techniques based on the available DTA data, without adequately addressing the data scarcity issue.
We present the SSM-DTA framework, which incorporates three simple yet highly effective strategies.
arXiv Detail & Related papers (2022-06-20T14:53:25Z) - Extracting Chemical-Protein Interactions via Calibrated Deep Neural
Network and Self-training [0.8376091455761261]
"calibration" techniques have been applied to deep learning models to estimate the data uncertainty and improve the reliability.
In this study, to extract chemical--protein interactions, we propose a DNN-based approach incorporating uncertainty information and calibration techniques.
Our approach has achieved state-of-the-art performance with regard to the Biocreative VI ChemProt task, while preserving higher calibration abilities than those of previous approaches.
arXiv Detail & Related papers (2020-11-04T10:14:31Z) - Optimizing Molecules using Efficient Queries from Property Evaluations [66.66290256377376]
We propose QMO, a generic query-based molecule optimization framework.
QMO improves the desired properties of an input molecule based on efficient queries.
We show that QMO outperforms existing methods in the benchmark tasks of optimizing small organic molecules.
arXiv Detail & Related papers (2020-11-03T18:51:18Z) - Deep Learning for Virtual Screening: Five Reasons to Use ROC Cost
Functions [80.12620331438052]
deep learning has become an important tool for rapid screening of billions of molecules in silico for potential hits containing desired chemical features.
Despite its importance, substantial challenges persist in training these models, such as severe class imbalance, high decision thresholds, and lack of ground truth labels in some datasets.
We argue in favor of directly optimizing the receiver operating characteristic (ROC) in such cases, due to its robustness to class imbalance.
arXiv Detail & Related papers (2020-06-25T08:46:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.