Accurate Molecular-Orbital-Based Machine Learning Energies via
Unsupervised Clustering of Chemical Space
- URL: http://arxiv.org/abs/2204.09831v1
- Date: Thu, 21 Apr 2022 00:56:16 GMT
- Title: Accurate Molecular-Orbital-Based Machine Learning Energies via
Unsupervised Clustering of Chemical Space
- Authors: Lixue Cheng, Jiace Sun, Thomas F. Miller III
- Abstract summary: We introduce an unsupervised clustering algorithm to improve training efficiency and accuracy in predicting energies using molecular-orbital-based machine learning (MOB-ML)
This work determines clusters via the Gaussian mixture model (GMM) in an entirely automatic manner.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce an unsupervised clustering algorithm to improve training
efficiency and accuracy in predicting energies using molecular-orbital-based
machine learning (MOB-ML). This work determines clusters via the Gaussian
mixture model (GMM) in an entirely automatic manner and simplifies an earlier
supervised clustering approach [J. Chem. Theory Comput., 15, 6668 (2019)] by
eliminating both the necessity for user-specified parameters and the training
of an additional classifier. Unsupervised clustering results from GMM have the
advantage of accurately reproducing chemically intuitive groupings of frontier
molecular orbitals and having improved performance with an increasing number of
training examples. The resulting clusters from supervised or unsupervised
clustering is further combined with scalable Gaussian process regression (GPR)
or linear regression (LR) to learn molecular energies accurately by generating
a local regression model in each cluster. Among all four combinations of
regressors and clustering methods, GMM combined with scalable exact Gaussian
process regression (GMM/GPR) is the most efficient training protocol for
MOB-ML. The numerical tests of molecular energy learning on thermalized
datasets of drug-like molecules demonstrate the improved accuracy,
transferability, and learning efficiency of GMM/GPR over not only other
training protocols for MOB-ML, i.e., supervised regression-clustering combined
with GPR(RC/GPR) and GPR without clustering. GMM/GPR also provide the best
molecular energy predictions compared with the ones from literature on the same
benchmark datasets. With a lower scaling, GMM/GPR has a 10.4-fold speedup in
wall-clock training time compared with scalable exact GPR with a training size
of 6500 QM7b-T molecules.
Related papers
- Hierarchical Matrix Completion for the Prediction of Properties of Binary Mixtures [3.0478550046333965]
We introduce a novel generic approach for improving data-driven models.
We lump components that behave similarly into chemical classes and model them jointly.
Using clustering leads to significantly improved predictions compared to an MCM without clustering.
arXiv Detail & Related papers (2024-10-08T14:04:30Z) - Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space [72.52365911990935]
We introduce Bellman Diffusion, a novel DGM framework that maintains linearity in MDPs through gradient and scalar field modeling.
Our results show that Bellman Diffusion achieves accurate field estimations and is a capable image generator, converging 1.5x faster than the traditional histogram-based baseline in distributional RL tasks.
arXiv Detail & Related papers (2024-10-02T17:53:23Z) - Understanding Reinforcement Learning-Based Fine-Tuning of Diffusion Models: A Tutorial and Review [63.31328039424469]
This tutorial provides a comprehensive survey of methods for fine-tuning diffusion models to optimize downstream reward functions.
We explain the application of various RL algorithms, including PPO, differentiable optimization, reward-weighted MLE, value-weighted sampling, and path consistency learning.
arXiv Detail & Related papers (2024-07-18T17:35:32Z) - Adaptive Fuzzy C-Means with Graph Embedding [84.47075244116782]
Fuzzy clustering algorithms can be roughly categorized into two main groups: Fuzzy C-Means (FCM) based methods and mixture model based methods.
We propose a novel FCM based clustering model that is capable of automatically learning an appropriate membership degree hyper- parameter value.
arXiv Detail & Related papers (2024-05-22T08:15:50Z) - Towards Predicting Equilibrium Distributions for Molecular Systems with
Deep Learning [60.02391969049972]
We introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.
DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system.
arXiv Detail & Related papers (2023-06-08T17:12:08Z) - Molecular-orbital-based Machine Learning for Open-shell and
Multi-reference Systems with Kernel Addition Gaussian Process Regression [0.0]
We introduce a novel machine learning strategy, kernel addition Gaussian process regression (KA-GPR), in molecular-orbital-based machine learning (MOB-ML)
We learn the total correlation energies of general electronic structure theories for closed- and open-shell systems by introducing a machine learning strategy.
The learning efficiency of MOB-ML (KA-GPR) is the same as the original MOB-ML method for the smallest criegee molecule, which is a closed-shell molecule with multi-reference characters.
arXiv Detail & Related papers (2022-07-17T23:20:19Z) - Molecular Dipole Moment Learning via Rotationally Equivariant Gaussian
Process Regression with Derivatives in Molecular-orbital-based Machine
Learning [0.0]
This study extends the accurate and transferable molecular-orbital-based machine learning (MOB-ML) approach.
A molecular-orbital-based (MOB) pairwise decomposition of the correlation part of the dipole moment is applied.
The proposed problem setup, feature design, and ML algorithm are shown to provide highly-accurate models.
arXiv Detail & Related papers (2022-05-31T02:42:50Z) - Improving Molecular Contrastive Learning via Faulty Negative Mitigation
and Decomposed Fragment Contrast [17.142976840521264]
We propose iMolCLR: improvement of Molecular Contrastive Learning of Representations with graph neural networks (GNNs)
Experiments have shown that the proposed strategies significantly improve the performance of GNN models.
iMolCLR intrinsically embeds scaffolds and functional groups that can reason molecule similarities.
arXiv Detail & Related papers (2022-02-18T18:33:27Z) - Molecular Energy Learning Using Alternative Blackbox Matrix-Matrix
Multiplication Algorithm for Exact Gaussian Process [0.0]
We present an application of the blackbox matrix-matrix multiplication (BBMM) algorithm to scale up the Gaussian Process (GP) training of molecular energies.
An alternative implementation of BBMM (AltBBMM) is also proposed to train more efficiently with the same accuracy and transferability.
The accuracy and transferability of both algorithms are examined on the benchmark of organic molecules with 7 and 13 heavy atoms.
arXiv Detail & Related papers (2021-09-20T19:59:06Z) - Continual Learning with Fully Probabilistic Models [70.3497683558609]
We present an approach for continual learning based on fully probabilistic (or generative) models of machine learning.
We propose a pseudo-rehearsal approach using a Gaussian Mixture Model (GMM) instance for both generator and classifier functionalities.
We show that GMR achieves state-of-the-art performance on common class-incremental learning problems at very competitive time and memory complexity.
arXiv Detail & Related papers (2021-04-19T12:26:26Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.