Efficient Chemical Space Exploration Using Active Learning Based on
Marginalized Graph Kernel: an Application for Predicting the Thermodynamic
Properties of Alkanes with Molecular Simulation
- URL: http://arxiv.org/abs/2209.00514v1
- Date: Thu, 1 Sep 2022 14:59:13 GMT
- Title: Efficient Chemical Space Exploration Using Active Learning Based on
Marginalized Graph Kernel: an Application for Predicting the Thermodynamic
Properties of Alkanes with Molecular Simulation
- Authors: Yan Xiang, Yu-Hang Tang, Zheng Gong, Hongyi Liu, Liang Wu, Guang Lin,
Huai Sun
- Abstract summary: We use molecular dynamics simulation to generate data and graph neural network (GNN) to predict.
In specific, targeting 251,728 alkane molecules consisting of 4 to 19 carbon atoms and their liquid physical properties.
validation shows that only 313 molecules were sufficient to train an accurate GNN model with $rm R2 > 0.99$ for computational test sets and $rm R2 > 0.94$ for experimental test sets.
- Score: 10.339394156446982
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce an explorative active learning (AL) algorithm based on Gaussian
process regression and marginalized graph kernel (GPR-MGK) to explore chemical
space with minimum cost. Using high-throughput molecular dynamics simulation to
generate data and graph neural network (GNN) to predict, we constructed an
active learning molecular simulation framework for thermodynamic property
prediction. In specific, targeting 251,728 alkane molecules consisting of 4 to
19 carbon atoms and their liquid physical properties: densities, heat
capacities, and vaporization enthalpies, we use the AL algorithm to select the
most informative molecules to represent the chemical space. Validation of
computational and experimental test sets shows that only 313 (0.124\% of the
total) molecules were sufficient to train an accurate GNN model with $\rm R^2 >
0.99$ for computational test sets and $\rm R^2 > 0.94$ for experimental test
sets. We highlight two advantages of the presented AL algorithm: compatibility
with high-throughput data generation and reliable uncertainty quantification.
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - $\nabla^2$DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials [35.949502493236146]
This work presents a new dataset and benchmark called $nabla2$DFT that is based on the nablaDFT.
It contains twice as much molecular structures, three times more conformations, new data types and tasks, and state-of-the-art models.
$nabla2$DFT is the first dataset that contains relaxation trajectories for a substantial number of drug-like molecules.
arXiv Detail & Related papers (2024-06-20T14:14:59Z) - Using GNN property predictors as molecule generators [16.34646723046073]
Graph neural networks (GNNs) have emerged as powerful tools to accurately predict materials and molecular properties.
In this article, we exploit the invertible nature of these neural networks to directly generate molecular structures with desired electronic properties.
arXiv Detail & Related papers (2024-06-05T13:53:47Z) - QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules [69.25826391912368]
We generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 999 or 2998 molecular dynamics trajectories.
We show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules.
arXiv Detail & Related papers (2023-06-15T23:39:07Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule
Representations [55.42602325017405]
We propose a novel method called GODE, which takes into account the two-level structure of individual molecules.
By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures.
When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - Gibbs-Helmholtz Graph Neural Network: capturing the temperature
dependency of activity coefficients at infinite dilution [1.290382979353427]
We develop the Gibbs-Helmholtz Graph Neural Network (GH-GNN) model for predicting $ln gamma_ijinfty$ of molecular systems at different temperatures.
We analyze the performance of GH-GNN for continuous and discrete inter/extrapolation and give indications for the model's applicability domain and expected accuracy.
arXiv Detail & Related papers (2022-12-02T14:25:58Z) - Predicting CO$_2$ Absorption in Ionic Liquids with Molecular Descriptors
and Explainable Graph Neural Networks [9.04563945965023]
Liquids (ILs) provide a promising solution for CO$$ capture and storage to mitigate global warming.
In this work, we develop both fingerprint-based Machine Learning models and Graph Neural Networks (GNNs) to predict the CO$$ in ILs.
Our method outperforms previous ML models by reaching a high accuracy (MAE of 0.0137, $R2$ of 0.9884)
arXiv Detail & Related papers (2022-09-29T18:31:12Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - Chemical-Reaction-Aware Molecule Representation Learning [88.79052749877334]
We propose using chemical reactions to assist learning molecule representation.
Our approach is proven effective to 1) keep the embedding space well-organized and 2) improve the generalization ability of molecule embeddings.
Experimental results demonstrate that our method achieves state-of-the-art performance in a variety of downstream tasks.
arXiv Detail & Related papers (2021-09-21T00:08:43Z) - ASGN: An Active Semi-supervised Graph Neural Network for Molecular
Property Prediction [61.33144688400446]
We propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules.
In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution.
At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning.
arXiv Detail & Related papers (2020-07-07T04:22:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.