A Unified Approach to Inferring Chemical Compounds with the Desired Aqueous Solubility
- URL: http://arxiv.org/abs/2409.04301v1
- Date: Fri, 6 Sep 2024 14:20:38 GMT
- Title: A Unified Approach to Inferring Chemical Compounds with the Desired Aqueous Solubility
- Authors: Muniba Batool, Naveed Ahmed Azam, Jianshen Zhu, Kazuya Haraguchi, Liang Zhao, Tatsuya Akutsu,
- Abstract summary: Aqueous solubility (AS) is a key physiochemical property that plays a crucial role in drug discovery and material design.
We report a novel unified approach to predict and infer chemical compounds with the desired AS based on simple deterministic graph-theoretic descriptors.
- Score: 5.763661159910719
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Aqueous solubility (AS) is a key physiochemical property that plays a crucial role in drug discovery and material design. We report a novel unified approach to predict and infer chemical compounds with the desired AS based on simple deterministic graph-theoretic descriptors, multiple linear regression (MLR) and mixed integer linear programming (MILP). Selected descriptors based on a forward stepwise procedure enabled the simplest regression model, MLR, to achieve significantly good prediction accuracy compared to the existing approaches, achieving the accuracy in the range [0.7191, 0.9377] for 29 diverse datasets. By simulating these descriptors and learning models as MILPs, we inferred mathematically exact and optimal compounds with the desired AS, prescribed structures, and up to 50 non-hydrogen atoms in a reasonable time range [6, 1204] seconds. These findings indicate a strong correlation between the simple graph-theoretic descriptors and the AS of compounds, potentially leading to a deeper understanding of their AS without relying on widely used complicated chemical descriptors and complex machine learning models that are computationally expensive, and therefore difficult to use for inference. An implementation of the proposed approach is available at https://github.com/ku-dml/mol-infer/tree/master/AqSol.
Related papers
- BAPULM: Binding Affinity Prediction using Language Models [7.136205674624813]
We introduce BAPULM, an innovative sequence-based framework that leverages the chemical latent representations of proteins via ProtT5-XL-U50 and through MolFormer.
Our approach was validated extensively on benchmark datasets, achieving sequential scoring power (R) values of 0.925 $pm$ 0.043, 0.914 $pm$ 0.004, and 0.8132 $pm$ 0.001 on benchmark1k2101, Test2016_290, and CSAR-HiQ_36, respectively.
arXiv Detail & Related papers (2024-11-06T04:35:30Z) - MaD-Scientist: AI-based Scientist solving Convection-Diffusion-Reaction Equations Using Massive PINN-Based Prior Data [22.262191225577244]
We explore whether a similar approach can be applied to scientific foundation models (SFMs)
We collect low-cost physics-informed neural network (PINN)-based approximated prior data in the form of solutions to partial differential equations (PDEs) constructed through an arbitrary linear combination of mathematical dictionaries.
We provide experimental evidence on the one-dimensional convection-diffusion-reaction equation, which demonstrate that pre-training remains robust even with approximated prior data.
arXiv Detail & Related papers (2024-10-09T00:52:00Z) - ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering [54.80411755871931]
Question Answering (QA) effectively evaluates language models' reasoning and knowledge depth.
Chemical QA plays a crucial role in both education and research by effectively translating complex chemical information into readily understandable format.
This dataset reflects typical real-world challenges, including an imbalanced data distribution and a substantial amount of unlabeled data that can be potentially useful.
We introduce a QAMatch model, specifically designed to effectively answer chemical questions by fully leveraging our collected data.
arXiv Detail & Related papers (2024-07-24T01:46:55Z) - Accelerating Drug Safety Assessment using Bidirectional-LSTM for SMILES Data [0.0]
Bi-Directional Long Short Term Memory (BiLSTM) is a variant of Recurrent Neural Network (RNN) that processes input molecular sequences.
The proposed work aims to understand the sequential patterns encoded in the SMILES strings, which are then utilised for predicting the toxicity of the molecules.
arXiv Detail & Related papers (2024-07-08T18:12:11Z) - YZS-model: A Predictive Model for Organic Drug Solubility Based on Graph Convolutional Networks and Transformer-Attention [9.018408514318631]
Traditional methods often miss complex molecular structures, leading to inaccuracies.
We introduce the YZS-Model, a deep learning framework integrating Graph Convolutional Networks (GCN), Transformer architectures, and Long Short-Term Memory (LSTM) networks.
YZS-Model achieved an $R2$ of 0.59 and an RMSE of 0.57, outperforming benchmark models.
arXiv Detail & Related papers (2024-06-27T12:40:29Z) - Regressor-free Molecule Generation to Support Drug Response Prediction [83.25894107956735]
Conditional generation based on the target IC50 score can obtain a more effective sampling space.
Regressor-free guidance combines a diffusion model's score estimation with a regression controller model's gradient based on number labels.
arXiv Detail & Related papers (2024-05-23T13:22:17Z) - A Gaussian Process Model for Ordinal Data with Applications to Chemoinformatics [0.0]
We present conditional Gaussian process models to predict ordinal outcomes from chemical experiments.
A novel aspect of our model is that the kernel contains a scaling parameter, that controls the strength of the correlation between elements of the chemical space.
Using molecular fingerprints, a numerical representation of a compound's location within the chemical space, we show that accounting for correlation amongst chemical compounds improves predictive performance.
arXiv Detail & Related papers (2024-05-16T11:18:32Z) - Local manifold learning and its link to domain-based physics knowledge [53.15471241298841]
In many reacting flow systems, the thermo-chemical state-space is assumed to evolve close to a low-dimensional manifold (LDM)
We show that PCA applied in local clusters of data (local PCA) is capable of detecting the intrinsic parameterization of the thermo-chemical state-space.
arXiv Detail & Related papers (2022-07-01T09:06:25Z) - RetCL: A Selection-based Approach for Retrosynthesis via Contrastive
Learning [107.64562550844146]
Retrosynthesis is an emerging research area of deep learning.
We propose a new approach that reformulating retrosynthesis into a selection problem of reactants from a candidate set of commercially available molecules.
For learning the score functions, we also propose a novel contrastive training scheme with hard negative mining.
arXiv Detail & Related papers (2021-05-03T12:47:57Z) - Deep Learning for Virtual Screening: Five Reasons to Use ROC Cost
Functions [80.12620331438052]
deep learning has become an important tool for rapid screening of billions of molecules in silico for potential hits containing desired chemical features.
Despite its importance, substantial challenges persist in training these models, such as severe class imbalance, high decision thresholds, and lack of ground truth labels in some datasets.
We argue in favor of directly optimizing the receiver operating characteristic (ROC) in such cases, due to its robustness to class imbalance.
arXiv Detail & Related papers (2020-06-25T08:46:37Z) - Retrosynthesis Prediction with Conditional Graph Logic Network [118.70437805407728]
Computer-aided retrosynthesis is finding renewed interest from both chemistry and computer science communities.
We propose a new approach to this task using the Conditional Graph Logic Network, a conditional graphical model built upon graph neural networks.
arXiv Detail & Related papers (2020-01-06T05:36:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.