Related papers: Model-free quantification of completeness, uncertainties, and outliers in atomistic machine learning using information theory

Model-free quantification of completeness, uncertainties, and outliers in atomistic machine learning using information theory

URL: http://arxiv.org/abs/2404.12367v2
Date: Wed, 18 Sep 2024 16:30:21 GMT
Title: Model-free quantification of completeness, uncertainties, and outliers in atomistic machine learning using information theory
Authors: Daniel Schwalbe-Koda, Sebastien Hamel, Babak Sadigh, Fei Zhou, Vincenzo Lordi,
Abstract summary: atomistic machine learning (ML) often relies on unsupervised learning or model predictions to analyze information contents. Here, we introduce a theoretical framework that provides a rigorous, model-free tool to quantify information contents in atomistic simulations.
Score: 4.59916193837551
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: An accurate description of information is relevant for a range of problems in atomistic machine learning (ML), such as crafting training sets, performing uncertainty quantification (UQ), or extracting physical insights from large datasets. However, atomistic ML often relies on unsupervised learning or model predictions to analyze information contents from simulation or training data. Here, we introduce a theoretical framework that provides a rigorous, model-free tool to quantify information contents in atomistic simulations. We demonstrate that the information entropy of a distribution of atom-centered environments explains known heuristics in ML potential developments, from training set sizes to dataset optimality. Using this tool, we propose a model-free UQ method that reliably predicts epistemic uncertainty and detects out-of-distribution samples, including rare events in systems such as nucleation. This method provides a general tool for data-driven atomistic modeling and combines efforts in ML, simulations, and physical explainability.

Related papers

Materials Learning Algorithms (MALA): Scalable Machine Learning for Electronic Structure Calculations in Large-Scale Atomistic Simulations [2.04071520659173]
We present the Materials Learning Algorithms (MALA) package, a scalable machine learning framework suitable for large-scale atomistic simulations. MALA models efficiently predict key electronic observables, including local density of states, electronic density, density of states, and total energy. We demonstrate MALA's capabilities with examples including boron clusters, aluminum across its solid-liquid phase boundary, and predicting the electronic structure of a stacking fault in a large beryllium slab.
arXiv Detail & Related papers (2024-11-29T11:10:29Z)
Discovering Interpretable Physical Models using Symbolic Regression and Discrete Exterior Calculus [55.2480439325792]
We propose a framework that combines Symbolic Regression (SR) and Discrete Exterior Calculus (DEC) for the automated discovery of physical models. DEC provides building blocks for the discrete analogue of field theories, which are beyond the state-of-the-art applications of SR to physical problems. We prove the effectiveness of our methodology by re-discovering three models of Continuum Physics from synthetic experimental data.
arXiv Detail & Related papers (2023-10-10T13:23:05Z)
Electronic Structure Prediction of Multi-million Atom Systems Through Uncertainty Quantification Enabled Transfer Learning [5.4875371069660925]
Ground state electron density -- obtainable using Kohn-Sham Density Functional Theory (KS-DFT) simulations -- contains a wealth of material information. However, the computational expense of KS-DFT scales cubically with system size which tends to stymie training data generation. Here, we address this fundamental challenge by employing transfer learning to leverage the multi-scale nature of the training data.
arXiv Detail & Related papers (2023-08-24T21:41:29Z)
Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data. In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z)
Prediction of liquid fuel properties using machine learning models with Gaussian processes and probabilistic conditional generative learning [56.67751936864119]
The present work aims to construct cheap-to-compute machine learning (ML) models to act as closure equations for predicting the physical properties of alternative fuels. Those models can be trained using the database from MD simulations and/or experimental measurements in a data-fusion-fidelity approach. The results show that ML models can predict accurately the fuel properties of a wide range of pressure and temperature conditions.
arXiv Detail & Related papers (2021-10-18T14:43:50Z)
Learning Transport Processes with Machine Intelligence [0.0]
We present a machine learning based approach to address the study of transport processes. Our model is capable of learning latent representations of the transport process substantially closer to the ground truth than expected.
arXiv Detail & Related papers (2021-09-27T14:49:22Z)
Hessian-based toolbox for reliable and interpretable machine learning in physics [58.720142291102135]
We present a toolbox for interpretability and reliability, extrapolation of the model architecture. It provides a notion of the influence of the input data on the prediction at a given test point, an estimation of the uncertainty of the model predictions, and an agnostic score for the model predictions. Our work opens the road to the systematic use of interpretability and reliability methods in ML applied to physics and, more generally, science.
arXiv Detail & Related papers (2021-08-04T16:32:59Z)
Calibrated Uncertainty for Molecular Property Prediction using Ensembles of Message Passing Neural Networks [11.47132155400871]
We extend a message passing neural network designed specifically for predicting properties of molecules and materials. We show that our approach results in accurate models for predicting molecular formation energies with calibrated uncertainty.
arXiv Detail & Related papers (2021-07-13T13:28:11Z)
Quantum-tailored machine-learning characterization of a superconducting qubit [50.591267188664666]
We develop an approach to characterize the dynamics of a quantum device and learn device parameters. This approach outperforms physics-agnostic recurrent neural networks trained on numerically generated and experimental data. This demonstration shows how leveraging domain knowledge improves the accuracy and efficiency of this characterization task.
arXiv Detail & Related papers (2021-06-24T15:58:57Z)
Using Data Assimilation to Train a Hybrid Forecast System that Combines Machine-Learning and Knowledge-Based Components [52.77024349608834]
We consider the problem of data-assisted forecasting of chaotic dynamical systems when the available data is noisy partial measurements. We show that by using partial measurements of the state of the dynamical system, we can train a machine learning model to improve predictions made by an imperfect knowledge-based model.
arXiv Detail & Related papers (2021-02-15T19:56:48Z)
Wavelet Scattering Networks for Atomistic Systems with Extrapolation of Material Properties [7.555136209115944]
A dream of machine learning in materials science is for a model to learn the underlying physics of an atomic system. In this work, we test the generalizability of our $textLi_alphatextSi$ energy predictor to properties that were not included in the training set.
arXiv Detail & Related papers (2020-06-01T20:36:17Z)
Embedded-physics machine learning for coarse-graining and collective variable discovery without data [3.222802562733787]
We present a novel learning framework that consistently embeds underlying physics. We propose a novel objective based on reverse Kullback-Leibler divergence that fully incorporates the available physics in the form of the atomistic force field. We demonstrate the algorithmic advances in terms of predictive ability and the physical meaning of the revealed CVs for a bimodal potential energy function and the alanine dipeptide.
arXiv Detail & Related papers (2020-02-24T10:28:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.