Transition1x -- a Dataset for Building Generalizable Reactive Machine
Learning Potentials
- URL: http://arxiv.org/abs/2207.12858v1
- Date: Mon, 25 Jul 2022 07:30:14 GMT
- Title: Transition1x -- a Dataset for Building Generalizable Reactive Machine
Learning Potentials
- Authors: Mathias Schreiner, Arghya Bhowmik, Tejs Vegge, Jonas Busk, Ole Winther
- Abstract summary: We present the dataset Transition1x containing 9.6 million Density Functional Theory (DFT) calculations.
We show that ML models cannot learn features in transition-state regions solely by training on hitherto popular benchmark datasets.
- Score: 7.171984408392421
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine Learning (ML) models have, in contrast to their usefulness in
molecular dynamics studies, had limited success as surrogate potentials for
reaction barrier search. It is due to the scarcity of training data in relevant
transition state regions of chemical space. Currently, available datasets for
training ML models on small molecular systems almost exclusively contain
configurations at or near equilibrium. In this work, we present the dataset
Transition1x containing 9.6 million Density Functional Theory (DFT)
calculations of forces and energies of molecular configurations on and around
reaction pathways at the wB97x/6-31G(d) level of theory. The data was generated
by running Nudged Elastic Band (NEB) calculations with DFT on 10k reactions
while saving intermediate calculations. We train state-of-the-art equivariant
graph message-passing neural network models on Transition1x and cross-validate
on the popular ANI1x and QM9 datasets. We show that ML models cannot learn
features in transition-state regions solely by training on hitherto popular
benchmark datasets. Transition1x is a new challenging benchmark that will
provide an important step towards developing next-generation ML force fields
that also work far away from equilibrium configurations and reactive systems.
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Cascade of phase transitions in the training of Energy-based models [9.945465034701288]
We investigate the feature encoding process in a prototypical energy-based generative model, the Bernoulli-Bernoulli RBM.
Our study tracks the evolution of the model's weight matrix through its singular value decomposition.
We validate our theoretical results by training the Bernoulli-Bernoulli RBM on real data sets.
arXiv Detail & Related papers (2024-05-23T15:25:56Z) - Retrosynthesis prediction enhanced by in-silico reaction data
augmentation [66.5643280109899]
We present RetroWISE, a framework that employs a base model inferred from real paired data to perform in-silico reaction generation and augmentation.
On three benchmark datasets, RetroWISE achieves the best overall performance against state-of-the-art models.
arXiv Detail & Related papers (2024-01-31T07:40:37Z) - Assessing Neural Network Representations During Training Using
Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process.
We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures.
We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z) - Evaluating the Transferability of Machine-Learned Force Fields for
Material Property Modeling [2.494740426749958]
We present a more comprehensive set of benchmarking tests for evaluating the transferability of machine-learned force fields.
We use a graph neural network (GNN)-based force field coupled with the OpenMM package to carry out MD simulations for Argon.
Our results show that the model can accurately capture the behavior of the solid phase only when the configurations from the solid phase are included in the training dataset.
arXiv Detail & Related papers (2023-01-10T00:25:48Z) - Transfer learning for chemically accurate interatomic neural network
potentials [0.0]
We show that pre-training the network parameters on data obtained from density functional calculations improves the sample efficiency of models trained on more accurate ab-initio data.
We provide GM-NN potentials pre-trained and fine-tuned on the ANI-1x and ANI-1ccx data sets, which can easily be fine-tuned on and applied to organic molecules.
arXiv Detail & Related papers (2022-12-07T19:21:01Z) - NeuralNEB -- Neural Networks can find Reaction Paths Fast [7.7365628406567675]
Quantum mechanical methods like Density Functional Theory (DFT) are used with great success alongside efficient search algorithms for studying kinetics of reactive systems.
Machine Learning (ML) models have turned out to be excellent emulators of small molecule DFT calculations and could possibly replace DFT in such tasks.
In this paper we train state of the art equivariant Graph Neural Network (GNN)-based models on around 10.000 elementary reactions from the Transition1x dataset.
arXiv Detail & Related papers (2022-07-20T15:29:45Z) - Improving Molecular Representation Learning with Metric
Learning-enhanced Optimal Transport [49.237577649802034]
We develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems.
MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances.
arXiv Detail & Related papers (2022-02-13T04:56:18Z) - Equivariant vector field network for many-body system modeling [65.22203086172019]
Equivariant Vector Field Network (EVFN) is built on a novel equivariant basis and the associated scalarization and vectorization layers.
We evaluate our method on predicting trajectories of simulated Newton mechanics systems with both full and partially observed data.
arXiv Detail & Related papers (2021-10-26T14:26:25Z) - BIGDML: Towards Exact Machine Learning Force Fields for Materials [55.944221055171276]
Machine-learning force fields (MLFF) should be accurate, computationally and data efficient, and applicable to molecules, materials, and interfaces thereof.
Here, we introduce the Bravais-Inspired Gradient-Domain Machine Learning approach and demonstrate its ability to construct reliable force fields using a training set with just 10-200 atoms.
arXiv Detail & Related papers (2021-06-08T10:14:57Z) - Automated discovery of a robust interatomic potential for aluminum [4.6028828826414925]
Machine learning (ML) based potentials aim for faithful emulation of quantum mechanics (QM) calculations at drastically reduced computational cost.
We present a highly automated approach to dataset construction using the principles of active learning (AL)
We demonstrate this approach by building an ML potential for aluminum (ANI-Al)
To demonstrate transferability, we perform a 1.3M atom shock simulation, and show that ANI-Al predictions agree very well with DFT calculations on local atomic environments sampled from the nonequilibrium dynamics.
arXiv Detail & Related papers (2020-03-10T19:06:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.