Enhanced sampling of robust molecular datasets with uncertainty-based
collective variables
- URL: http://arxiv.org/abs/2402.03753v1
- Date: Tue, 6 Feb 2024 06:42:51 GMT
- Title: Enhanced sampling of robust molecular datasets with uncertainty-based
collective variables
- Authors: Aik Rui Tan, Johannes C. B. Dietschreit, Rafael Gomez-Bombarelli
- Abstract summary: We propose a method that leverages uncertainty as the collective variable (CV) to guide the acquisition of chemically-relevant data points.
This approach employs a Gaussian Mixture Model-based uncertainty metric from a single model as the CV for biased molecular dynamics simulations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating a data set that is representative of the accessible configuration
space of a molecular system is crucial for the robustness of machine learned
interatomic potentials (MLIP). However, the complexity of molecular systems,
characterized by intricate potential energy surfaces (PESs) with numerous local
minima and energy barriers, presents a significant challenge. Traditional
methods of data generation, such as random sampling or exhaustive exploration,
are either intractable or may not capture rare, but highly informative
configurations. In this study, we propose a method that leverages uncertainty
as the collective variable (CV) to guide the acquisition of chemically-relevant
data points, focusing on regions of the configuration space where ML model
predictions are most uncertain. This approach employs a Gaussian Mixture
Model-based uncertainty metric from a single model as the CV for biased
molecular dynamics simulations. The effectiveness of our approach in overcoming
energy barriers and exploring unseen energy minima, thereby enhancing the data
set in an active learning framework, is demonstrated on the alanine dipeptide
benchmark system.
Related papers
- Learning Invariant Molecular Representation in Latent Discrete Space [52.13724532622099]
We propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts.
Our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts.
arXiv Detail & Related papers (2023-10-22T04:06:44Z) - On the Interplay of Subset Selection and Informed Graph Neural Networks [3.091456764812509]
This work focuses on predicting the molecules atomization energy in the QM9 dataset.
We show how maximizing molecular diversity in the training set selection process increases the robustness of linear and nonlinear regression techniques.
We also check the reliability of the predictions made by the graph neural network with a model-agnostic explainer.
arXiv Detail & Related papers (2023-06-15T09:09:27Z) - Towards Predicting Equilibrium Distributions for Molecular Systems with
Deep Learning [60.02391969049972]
We introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.
DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system.
arXiv Detail & Related papers (2023-06-08T17:12:08Z) - Generative structured normalizing flow Gaussian processes applied to
spectroscopic data [4.0773490083614075]
In the physical sciences, limited training data may not adequately characterize future observed data.
It is critical that models adequately indicate uncertainty, particularly when they may be asked to extrapolate.
We demonstrate the methodology on laser-induced breakdown spectroscopy data from the ChemCam instrument onboard the Mars rover Curiosity.
arXiv Detail & Related papers (2022-12-14T23:57:46Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Learning inducing points and uncertainty on molecular data by scalable
variational Gaussian processes [0.0]
We show that variational learning of the inducing points in a molecular descriptor space improves the prediction of energies and atomic forces on two molecular dynamics datasets.
We extend our study to a large molecular crystal system, showing that variational GP models perform well for predicting atomic forces by efficiently learning a sparse representation of the dataset.
arXiv Detail & Related papers (2022-07-16T10:41:41Z) - Accurate Machine Learned Quantum-Mechanical Force Fields for
Biomolecular Simulations [51.68332623405432]
Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes.
Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations.
This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations.
arXiv Detail & Related papers (2022-05-17T13:08:28Z) - A2I Transformer: Permutation-equivariant attention network for pairwise
and many-body interactions with minimal featurization [0.1469945565246172]
In this work, we suggest an end-to-end model which directly predicts per-atom energy from the coordinates of particles.
We tested our model against several challenges in molecular simulation problems, including periodic boundary condition (PBC), $n$-body interaction, and binary composition.
arXiv Detail & Related papers (2021-10-27T12:18:25Z) - Federated Learning of Molecular Properties in a Heterogeneous Setting [79.00211946597845]
We introduce federated heterogeneous molecular learning to address these challenges.
Federated learning allows end-users to build a global model collaboratively while preserving the training data distributed over isolated clients.
FedChem should enable a new type of collaboration for improving AI in chemistry that mitigates concerns about valuable chemical data.
arXiv Detail & Related papers (2021-09-15T12:49:13Z) - Learning Neural Generative Dynamics for Molecular Conformation
Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph.
We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z) - Embedded-physics machine learning for coarse-graining and collective
variable discovery without data [3.222802562733787]
We present a novel learning framework that consistently embeds underlying physics.
We propose a novel objective based on reverse Kullback-Leibler divergence that fully incorporates the available physics in the form of the atomistic force field.
We demonstrate the algorithmic advances in terms of predictive ability and the physical meaning of the revealed CVs for a bimodal potential energy function and the alanine dipeptide.
arXiv Detail & Related papers (2020-02-24T10:28:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.