Predicting Aqueous Solubility of Organic Molecules Using Deep Learning
Models with Varied Molecular Representations
- URL: http://arxiv.org/abs/2105.12638v2
- Date: Thu, 27 May 2021 01:03:43 GMT
- Title: Predicting Aqueous Solubility of Organic Molecules Using Deep Learning
Models with Varied Molecular Representations
- Authors: Gihan Panapitiya, Michael Girard, Aaron Hollas, Vijay Murugesan, Wei
Wang, Emily Saldanha
- Abstract summary: The goal of this study is to develop a general model capable of predicting the solubility of a broad range of organic molecules.
Using the largest currently available solubility dataset, we implement deep learning-based models to predict solubility from molecular structure.
We find that models using molecular descriptors achieve the best performance, with GNN models also achieving good performance.
- Score: 3.10678679607547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Determining the aqueous solubility of molecules is a vital step in many
pharmaceutical, environmental, and energy storage applications. Despite efforts
made over decades, there are still challenges associated with developing a
solubility prediction model with satisfactory accuracy for many of these
applications. The goal of this study is to develop a general model capable of
predicting the solubility of a broad range of organic molecules. Using the
largest currently available solubility dataset, we implement deep
learning-based models to predict solubility from molecular structure and
explore several different molecular representations including molecular
descriptors, simplified molecular-input line-entry system (SMILES) strings,
molecular graphs, and three-dimensional (3D) atomic coordinates using four
different neural network architectures - fully connected neural networks
(FCNNs), recurrent neural networks (RNNs), graph neural networks (GNNs), and
SchNet. We find that models using molecular descriptors achieve the best
performance, with GNN models also achieving good performance. We perform
extensive error analysis to understand the molecular properties that influence
model performance, perform feature analysis to understand which information
about molecular structure is most valuable for prediction, and perform a
transfer learning and data size study to understand the impact of data
availability on model performance.
Related papers
- FragNet: A Graph Neural Network for Molecular Property Prediction with Four Layers of Interpretability [0.7499722271664147]
We introduce the FragNet architecture, a graph neural network capable of achieving prediction accuracies comparable to the current state-of-the-art models.
This model enables understanding of which atoms, covalent bonds, molecular fragments, and molecular fragment connections are critical in the prediction of a given molecular property.
The interpretable capabilities of FragNet are key to gaining scientific insights from the model's learned patterns between molecular structure and molecular properties.
arXiv Detail & Related papers (2024-10-16T01:37:01Z) - Integrating Chemical Language and Molecular Graph in Multimodal Fused Deep Learning for Drug Property Prediction [9.388979080270103]
We construct multimodal deep learning models to cover different molecular representations.
Compared with mono-modal models, our multimodal fused deep learning (MMFDL) models outperform single models in accuracy, reliability, and resistance capability against noise.
arXiv Detail & Related papers (2023-12-29T07:19:42Z) - Predicting Drug Solubility Using Different Machine Learning Methods --
Linear Regression Model with Extracted Chemical Features vs Graph
Convolutional Neural Network [1.8936798735951967]
We employ two machine learning models: a linear regression model and a graph convolutional neural network (GCNN) model, using various experimental datasets.
The present GCNN model has limited interpretability while the linear regression model allows scientists for a greater in-depth analysis of the underlying factors.
From the perspective of chemistry, using the linear regression model, we elucidated the impact of individual atom species and functional groups on overall solubility.
arXiv Detail & Related papers (2023-08-23T15:35:20Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Graph neural networks for the prediction of molecular structure-property
relationships [59.11160990637615]
Graph neural networks (GNNs) are a novel machine learning method that directly work on the molecular graph.
GNNs allow to learn properties in an end-to-end fashion, thereby avoiding the need for informative descriptors.
We describe the fundamentals of GNNs and demonstrate the application of GNNs via two examples for molecular property prediction.
arXiv Detail & Related papers (2022-07-25T11:30:44Z) - Accurate Prediction of Free Solvation Energy of Organic Molecules via
Graph Attention Network and Message Passing Neural Network from Pairwise
Atomistic Interactions [14.87390785780636]
We propose two novel models for the problem of free solvation energy predictions, based on the Graph Neural Network (GNN) architectures.
GNNs are capable of summarizing the predictive information of a molecule as low-dimensional features directly from its graph structure.
We show that our proposed models outperform all quantum mechanical and molecular dynamics methods in addition to existing alternative machine learning based approaches in the task of solvation free energy prediction.
arXiv Detail & Related papers (2021-04-15T22:15:18Z) - Few-Shot Graph Learning for Molecular Property Prediction [46.60746023179724]
We propose Meta-MGNN, a novel model for few-shot molecular property prediction.
To exploit unlabeled molecular information, Meta-MGNN further incorporates molecular structure, attribute based self-supervised modules and self-attentive task weights.
Extensive experiments on two public multi-property datasets demonstrate that Meta-MGNN outperforms a variety of state-of-the-art methods.
arXiv Detail & Related papers (2021-02-16T01:55:34Z) - ASGN: An Active Semi-supervised Graph Neural Network for Molecular
Property Prediction [61.33144688400446]
We propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules.
In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution.
At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning.
arXiv Detail & Related papers (2020-07-07T04:22:39Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.