Predicting Drug Solubility Using Different Machine Learning Methods --
Linear Regression Model with Extracted Chemical Features vs Graph
Convolutional Neural Network
- URL: http://arxiv.org/abs/2308.12325v2
- Date: Fri, 5 Jan 2024 01:28:36 GMT
- Title: Predicting Drug Solubility Using Different Machine Learning Methods --
Linear Regression Model with Extracted Chemical Features vs Graph
Convolutional Neural Network
- Authors: John Ho, Zhao-Heng Yin, Colin Zhang, Nicole Guo, Yang Ha
- Abstract summary: We employ two machine learning models: a linear regression model and a graph convolutional neural network (GCNN) model, using various experimental datasets.
The present GCNN model has limited interpretability while the linear regression model allows scientists for a greater in-depth analysis of the underlying factors.
From the perspective of chemistry, using the linear regression model, we elucidated the impact of individual atom species and functional groups on overall solubility.
- Score: 1.8936798735951967
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting the solubility of given molecules remains crucial in the
pharmaceutical industry. In this study, we revisited this extensively studied
topic, leveraging the capabilities of contemporary computing resources. We
employed two machine learning models: a linear regression model and a graph
convolutional neural network (GCNN) model, using various experimental datasets.
Both methods yielded reasonable predictions, with the GCNN model exhibiting the
highest level of performance. However, the present GCNN model has limited
interpretability while the linear regression model allows scientists for a
greater in-depth analysis of the underlying factors through feature importance
analysis, although more human inputs and evaluations on the overall dataset is
required. From the perspective of chemistry, using the linear regression model,
we elucidated the impact of individual atom species and functional groups on
overall solubility, highlighting the significance of comprehending how chemical
structure influences chemical properties in the drug development process. It is
learned that introducing oxygen atoms can increase the solubility of organic
molecules, while almost all other hetero atoms except oxygen and nitrogen tend
to decrease solubility.
Related papers
- YZS-model: A Predictive Model for Organic Drug Solubility Based on Graph Convolutional Networks and Transformer-Attention [9.018408514318631]
Traditional methods often miss complex molecular structures, leading to inaccuracies.
We introduce the YZS-Model, a deep learning framework integrating Graph Convolutional Networks (GCN), Transformer architectures, and Long Short-Term Memory (LSTM) networks.
YZS-Model achieved an $R2$ of 0.59 and an RMSE of 0.57, outperforming benchmark models.
arXiv Detail & Related papers (2024-06-27T12:40:29Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule
Representations [55.42602325017405]
We propose a novel method called GODE, which takes into account the two-level structure of individual molecules.
By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures.
When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - Atomic and Subgraph-aware Bilateral Aggregation for Molecular
Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA)
ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information.
Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z) - Graph neural networks for the prediction of molecular structure-property
relationships [59.11160990637615]
Graph neural networks (GNNs) are a novel machine learning method that directly work on the molecular graph.
GNNs allow to learn properties in an end-to-end fashion, thereby avoiding the need for informative descriptors.
We describe the fundamentals of GNNs and demonstrate the application of GNNs via two examples for molecular property prediction.
arXiv Detail & Related papers (2022-07-25T11:30:44Z) - Rxn Hypergraph: a Hypergraph Attention Model for Chemical Reaction
Representation [70.97737157902947]
There is currently no universal and widely adopted method for robustly representing chemical reactions.
Here we exploit graph-based representations of molecular structures to develop and test a hypergraph attention neural network approach.
We evaluate this hypergraph representation in three experiments using three independent data sets of chemical reactions.
arXiv Detail & Related papers (2022-01-02T12:33:10Z) - Size doesn't matter: predicting physico- or biochemical properties based
on dozens of molecules [0.0]
The paper shows a significant improvement in the performance of models for target properties with a lack of data.
The effects of the dataset composition on model quality and the applicability domain of the resulting models are also considered.
arXiv Detail & Related papers (2021-07-22T18:57:24Z) - Machine Learning Implicit Solvation for Molecular Dynamics [0.0]
We introduce Bornet, a graph neural network, to model the implicit solvent potential of mean force.
The success of this novel method demonstrates the potential benefit of applying machine learning methods in accurate modeling of solvent effects.
arXiv Detail & Related papers (2021-06-14T15:21:45Z) - Predicting Aqueous Solubility of Organic Molecules Using Deep Learning
Models with Varied Molecular Representations [3.10678679607547]
The goal of this study is to develop a general model capable of predicting the solubility of a broad range of organic molecules.
Using the largest currently available solubility dataset, we implement deep learning-based models to predict solubility from molecular structure.
We find that models using molecular descriptors achieve the best performance, with GNN models also achieving good performance.
arXiv Detail & Related papers (2021-05-26T15:54:54Z) - Accurate Prediction of Free Solvation Energy of Organic Molecules via
Graph Attention Network and Message Passing Neural Network from Pairwise
Atomistic Interactions [14.87390785780636]
We propose two novel models for the problem of free solvation energy predictions, based on the Graph Neural Network (GNN) architectures.
GNNs are capable of summarizing the predictive information of a molecule as low-dimensional features directly from its graph structure.
We show that our proposed models outperform all quantum mechanical and molecular dynamics methods in addition to existing alternative machine learning based approaches in the task of solvation free energy prediction.
arXiv Detail & Related papers (2021-04-15T22:15:18Z) - Graph Neural Networks for the Prediction of Substrate-Specific Organic
Reaction Conditions [79.45090959869124]
We present a systematic investigation using graph neural networks (GNNs) to model organic chemical reactions.
We evaluate seven different GNN architectures for classification tasks pertaining to the identification of experimental reagents and conditions.
arXiv Detail & Related papers (2020-07-08T17:21:00Z) - ASGN: An Active Semi-supervised Graph Neural Network for Molecular
Property Prediction [61.33144688400446]
We propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules.
In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution.
At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning.
arXiv Detail & Related papers (2020-07-07T04:22:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.