Uncertainty-biased molecular dynamics for learning uniformly accurate interatomic potentials
- URL: http://arxiv.org/abs/2312.01416v2
- Date: Sat, 02 Nov 2024 11:26:08 GMT
- Title: Uncertainty-biased molecular dynamics for learning uniformly accurate interatomic potentials
- Authors: Viktor Zaverkin, David Holzmüller, Henrik Christiansen, Federico Errica, Francesco Alesiani, Makoto Takamoto, Mathias Niepert, Johannes Kästner,
- Abstract summary: Active learning uses biased or unbiased molecular dynamics to generate candidate pools.
Existing biased and unbiased MD-simulation methods are prone to miss either rare events or extrapolative regions.
This work demonstrates that MD, when biased by the MLIP's energy uncertainty, simultaneously captures extrapolative regions and rare events.
- Score: 25.091146216183144
- License:
- Abstract: Efficiently creating a concise but comprehensive data set for training machine-learned interatomic potentials (MLIPs) is an under-explored problem. Active learning, which uses biased or unbiased molecular dynamics (MD) to generate candidate pools, aims to address this objective. Existing biased and unbiased MD-simulation methods, however, are prone to miss either rare events or extrapolative regions -- areas of the configurational space where unreliable predictions are made. This work demonstrates that MD, when biased by the MLIP's energy uncertainty, simultaneously captures extrapolative regions and rare events, which is crucial for developing uniformly accurate MLIPs. Furthermore, exploiting automatic differentiation, we enhance bias-forces-driven MD with the concept of bias stress. We employ calibrated gradient-based uncertainties to yield MLIPs with similar or, sometimes, better accuracy than ensemble-based methods at a lower computational cost. Finally, we apply uncertainty-biased MD to alanine dipeptide and MIL-53(Al), generating MLIPs that represent both configurational spaces more accurately than models trained with conventional MD.
Related papers
- Evaluation of uncertainty estimations for Gaussian process regression based machine learning interatomic potentials [0.0]
Uncertainty estimations for machine learning interatomic potentials are crucial to quantify the additional model error they introduce.
We consider GPR models with Coulomb and SOAP representations as inputs to predict potential energy surfaces and excitation energies of molecules.
We evaluate, how the GPR variance and ensemble-based uncertainties relate to the error and whether model performance improves by selecting the most uncertain samples from a fixed configuration space.
arXiv Detail & Related papers (2024-10-27T10:06:09Z) - Towards a Theoretical Understanding of Memorization in Diffusion Models [76.85077961718875]
Diffusion probabilistic models (DPMs) are being employed as mainstream models for Generative Artificial Intelligence (GenAI)
We provide a theoretical understanding of memorization in both conditional and unconditional DPMs under the assumption of model convergence.
We propose a novel data extraction method named textbfSurrogate condItional Data Extraction (SIDE) that leverages a time-dependent classifier trained on the generated data as a surrogate condition to extract training data from unconditional DPMs.
arXiv Detail & Related papers (2024-10-03T13:17:06Z) - Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space [72.52365911990935]
We introduce Bellman Diffusion, a novel DGM framework that maintains linearity in MDPs through gradient and scalar field modeling.
Our results show that Bellman Diffusion achieves accurate field estimations and is a capable image generator, converging 1.5x faster than the traditional histogram-based baseline in distributional RL tasks.
arXiv Detail & Related papers (2024-10-02T17:53:23Z) - Mitigating Exposure Bias in Score-Based Generation of Molecular Conformations [6.442534896075223]
We propose a method for measuring exposure bias in Score-Based Generative Models used for molecular conformation generation.
We design a new compensation algorithm Input Perturbation (IP), which is adapted from a method originally designed for DPMs only.
We achieve new state-of-the-art performance on the GEOM-Drugs dataset and are on par with GEOM-QM9.
arXiv Detail & Related papers (2024-09-21T04:54:37Z) - Accelerating the Training and Improving the Reliability of Machine-Learned Interatomic Potentials for Strongly Anharmonic Materials through Active Learning [0.0]
We show that an active learning scheme that combines MD with MLIPs (MLIP-MD) and uncertainty estimates can avoid such problematic predictions.
In this work, we show that an active learning scheme that combines MD with MLIPs (MLIP-MD) and uncertainty estimates can avoid such problematic predictions.
arXiv Detail & Related papers (2024-09-18T08:52:30Z) - Extracting Training Data from Unconditional Diffusion Models [76.85077961718875]
diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI)
We aim to establish a theoretical understanding of memorization in DPMs with 1) a memorization metric for theoretical analysis, 2) an analysis of conditional memorization with informative and random labels, and 3) two better evaluation metrics for measuring memorization.
Based on the theoretical analysis, we propose a novel data extraction method called textbfSurrogate condItional Data Extraction (SIDE) that leverages a trained on generated data as a surrogate condition to extract training data directly from unconditional diffusion models.
arXiv Detail & Related papers (2024-06-18T16:20:12Z) - A Multi-Grained Symmetric Differential Equation Model for Learning
Protein-Ligand Binding Dynamics [74.93549765488103]
In drug discovery, molecular dynamics simulation provides a powerful tool for predicting binding affinities, estimating transport properties, and exploring pocket sites.
We propose NeuralMD, the first machine learning surrogate that can facilitate numerical MD and provide accurate simulations in protein-ligand binding.
We show the efficiency and effectiveness of NeuralMD, with a 2000$times$ speedup over standard numerical MD simulation and outperforming all other ML approaches by up to 80% under the stability metric.
arXiv Detail & Related papers (2024-01-26T09:35:17Z) - Accurate machine learning force fields via experimental and simulation
data fusion [0.0]
Machine Learning (ML)-based force fields are attracting ever-increasing interest due to their capacity to span scales of classical interatomic potentials at quantum-level accuracy.
Here we leverage both Density Functional Theory (DFT) calculations and experimentally measured mechanical properties and lattice parameters to train an ML potential of titanium.
We demonstrate that the fused data learning strategy can concurrently satisfy all target objectives, thus resulting in a molecular model of higher accuracy compared to the models trained with a single source data.
arXiv Detail & Related papers (2023-08-17T18:22:19Z) - Physics-informed machine learning with differentiable programming for
heterogeneous underground reservoir pressure management [64.17887333976593]
Avoiding over-pressurization in subsurface reservoirs is critical for applications like CO2 sequestration and wastewater injection.
Managing the pressures by controlling injection/extraction are challenging because of complex heterogeneity in the subsurface.
We use differentiable programming with a full-physics model and machine learning to determine the fluid extraction rates that prevent over-pressurization.
arXiv Detail & Related papers (2022-06-21T20:38:13Z) - A data-driven peridynamic continuum model for upscaling molecular
dynamics [3.1196544696082613]
We propose a learning framework to extract, from molecular dynamics data, an optimal Linear Peridynamic Solid model.
We provide sufficient well-posedness conditions for discretized LPS models with sign-changing influence functions.
This framework guarantees that the resulting model is mathematically well-posed, physically consistent, and that it generalizes well to settings that are different from the ones used during training.
arXiv Detail & Related papers (2021-08-04T07:07:47Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.