Related papers: Predicting Aqueous Solubility of Organic Molecules Using Deep Learning Models with Varied Molecular Representations

Predicting Aqueous Solubility of Organic Molecules Using Deep Learning Models with Varied Molecular Representations

URL: http://arxiv.org/abs/2105.12638v2
Date: Thu, 27 May 2021 01:03:43 GMT
Title: Predicting Aqueous Solubility of Organic Molecules Using Deep Learning Models with Varied Molecular Representations
Authors: Gihan Panapitiya, Michael Girard, Aaron Hollas, Vijay Murugesan, Wei Wang, Emily Saldanha
Abstract summary: The goal of this study is to develop a general model capable of predicting the solubility of a broad range of organic molecules. Using the largest currently available solubility dataset, we implement deep learning-based models to predict solubility from molecular structure. We find that models using molecular descriptors achieve the best performance, with GNN models also achieving good performance.
Score: 3.10678679607547
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Determining the aqueous solubility of molecules is a vital step in many pharmaceutical, environmental, and energy storage applications. Despite efforts made over decades, there are still challenges associated with developing a solubility prediction model with satisfactory accuracy for many of these applications. The goal of this study is to develop a general model capable of predicting the solubility of a broad range of organic molecules. Using the largest currently available solubility dataset, we implement deep learning-based models to predict solubility from molecular structure and explore several different molecular representations including molecular descriptors, simplified molecular-input line-entry system (SMILES) strings, molecular graphs, and three-dimensional (3D) atomic coordinates using four different neural network architectures - fully connected neural networks (FCNNs), recurrent neural networks (RNNs), graph neural networks (GNNs), and SchNet. We find that models using molecular descriptors achieve the best performance, with GNN models also achieving good performance. We perform extensive error analysis to understand the molecular properties that influence model performance, perform feature analysis to understand which information about molecular structure is most valuable for prediction, and perform a transfer learning and data size study to understand the impact of data availability on model performance.

Related papers

Learning Hierarchical Interaction for Accurate Molecular Property Prediction [8.488251667425887]
We propose a Hierarchical Interaction Message Passing Mechanism, which serves as the foundation of our novel model, HimNet. Our method enables interaction-aware representation learning across atomic, motif, and molecular levels via hierarchical attention-guided message passing. Our method exhibits promising hierarchical interpretability, aligning well with chemical intuition on representative molecules.
arXiv Detail & Related papers (2025-04-28T15:19:28Z)
FragNet: A Graph Neural Network for Molecular Property Prediction with Four Layers of Interpretability [0.7499722271664147]
We introduce the FragNet architecture, a graph neural network capable of achieving prediction accuracies comparable to the current state-of-the-art models. This model enables understanding of which atoms, covalent bonds, molecular fragments, and molecular fragment connections are critical in the prediction of a given molecular property. The interpretable capabilities of FragNet are key to gaining scientific insights from the model's learned patterns between molecular structure and molecular properties.
arXiv Detail & Related papers (2024-10-16T01:37:01Z)
Integrating Chemical Language and Molecular Graph in Multimodal Fused Deep Learning for Drug Property Prediction [9.388979080270103]
We construct multimodal deep learning models to cover different molecular representations. Compared with mono-modal models, our multimodal fused deep learning (MMFDL) models outperform single models in accuracy, reliability, and resistance capability against noise.
arXiv Detail & Related papers (2023-12-29T07:19:42Z)
Predicting Drug Solubility Using Different Machine Learning Methods -- Linear Regression Model with Extracted Chemical Features vs Graph Convolutional Neural Network [1.8936798735951967]
We employ two machine learning models: a linear regression model and a graph convolutional neural network (GCNN) model, using various experimental datasets. The present GCNN model has limited interpretability while the linear regression model allows scientists for a greater in-depth analysis of the underlying factors. From the perspective of chemistry, using the linear regression model, we elucidated the impact of individual atom species and functional groups on overall solubility.
arXiv Detail & Related papers (2023-08-23T15:35:20Z)
Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [68.32093648671496]
We introduce GODE, which accounts for the dual-level structure inherent in molecules. Molecules possess an intrinsic graph structure and simultaneously function as nodes within a broader molecular knowledge graph. By pre-training two GNNs on different graph structures, GODE effectively fuses molecular structures with their corresponding knowledge graph substructures.
arXiv Detail & Related papers (2023-06-02T15:49:45Z)
Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction. Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations. On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z)
A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data. We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z)
Graph neural networks for the prediction of molecular structure-property relationships [59.11160990637615]
Graph neural networks (GNNs) are a novel machine learning method that directly work on the molecular graph. GNNs allow to learn properties in an end-to-end fashion, thereby avoiding the need for informative descriptors. We describe the fundamentals of GNNs and demonstrate the application of GNNs via two examples for molecular property prediction.
arXiv Detail & Related papers (2022-07-25T11:30:44Z)
Augmenting Molecular Deep Generative Models with Topological Data Analysis Representations [21.237758981760784]
We present a SMILES Variational Auto-Encoder (VAE) augmented with topological data analysis (TDA) representations of molecules. Our experiments show that this TDA augmentation enables a SMILES VAE to capture the complex relation between 3D geometry and electronic properties.
arXiv Detail & Related papers (2021-06-08T15:49:21Z)
Accurate Prediction of Free Solvation Energy of Organic Molecules via Graph Attention Network and Message Passing Neural Network from Pairwise Atomistic Interactions [14.87390785780636]
We propose two novel models for the problem of free solvation energy predictions, based on the Graph Neural Network (GNN) architectures. GNNs are capable of summarizing the predictive information of a molecule as low-dimensional features directly from its graph structure. We show that our proposed models outperform all quantum mechanical and molecular dynamics methods in addition to existing alternative machine learning based approaches in the task of solvation free energy prediction.
arXiv Detail & Related papers (2021-04-15T22:15:18Z)
Few-Shot Graph Learning for Molecular Property Prediction [46.60746023179724]
We propose Meta-MGNN, a novel model for few-shot molecular property prediction. To exploit unlabeled molecular information, Meta-MGNN further incorporates molecular structure, attribute based self-supervised modules and self-attentive task weights. Extensive experiments on two public multi-property datasets demonstrate that Meta-MGNN outperforms a variety of state-of-the-art methods.
arXiv Detail & Related papers (2021-02-16T01:55:34Z)
ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction [61.33144688400446]
We propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules. In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution. At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning.
arXiv Detail & Related papers (2020-07-07T04:22:39Z)
Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning. GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data. We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.