Matbench Discovery -- A framework to evaluate machine learning crystal
stability predictions
- URL: http://arxiv.org/abs/2308.14920v2
- Date: Sun, 4 Feb 2024 17:12:47 GMT
- Title: Matbench Discovery -- A framework to evaluate machine learning crystal
stability predictions
- Authors: Janosh Riebesell, Rhys E. A. Goodall, Philipp Benner, Yuan Chiang,
Bowen Deng, Alpha A. Lee, Anubhav Jain, Kristin A. Persson
- Abstract summary: Matbench Discovery simulates the deployment of machine learning (ML) energy models in a search for stable inorganic crystals.
We address the disconnect between (i) thermodynamic stability and formation energy and (ii) in-domain vs out-of-distribution performance.
- Score: 2.234359119457391
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Matbench Discovery simulates the deployment of machine learning (ML) energy
models in a high-throughput search for stable inorganic crystals. We address
the disconnect between (i) thermodynamic stability and formation energy and
(ii) in-domain vs out-of-distribution performance. Alongside this paper, we
publish a Python package to aid with future model submissions and a growing
online leaderboard with further insights into trade-offs between various
performance metrics. To answer the question which ML methodology performs best
at materials discovery, our initial release explores a variety of models
including random forests, graph neural networks (GNN), one-shot predictors,
iterative Bayesian optimizers and universal interatomic potentials (UIP).
Ranked best-to-worst by their test set F1 score on thermodynamic stability
prediction, we find CHGNet > M3GNet > MACE > ALIGNN > MEGNet > CGCNN > CGCNN+P
> Wrenformer > BOWSR > Voronoi tessellation fingerprints with random forest.
The top 3 models are UIPs, the winning methodology for ML-guided materials
discovery, achieving F1 scores of ~0.6 for crystal stability classification and
discovery acceleration factors (DAF) of up to 5x on the first 10k most stable
predictions compared to dummy selection from our test set. We also highlight a
sharp disconnect between commonly used global regression metrics and more
task-relevant classification metrics. Accurate regressors are susceptible to
unexpectedly high false-positive rates if those accurate predictions lie close
to the decision boundary at 0 eV/atom above the convex hull where most
materials are. Our results highlight the need to focus on classification
metrics that actually correlate with improved stability hit rate.
Related papers
- A Causal Graph-Enhanced Gaussian Process Regression for Modeling Engine-out NOx [0.0]
The objective of this paper is to develop and validate a probabilistic model to predict engine-out NOx emissions using Gaussian process regression.
We employ three variants of Gaussian process models: the first with a standard radial basis function kernel with input window, the second incorporating a deep kernel using convolutional neural networks to capture temporal dependencies, and the third enriching the deep kernel with a causal graph derived via graph convolutional networks.
All models are compared against a virtual ECM sensor using both quantitative and qualitative metrics. We conclude that our model provides an improvement in predictive performance when using an input window and a deep kernel structure.
arXiv Detail & Related papers (2024-10-24T04:23:57Z) - Towards Long-Term predictions of Turbulence using Neural Operators [68.8204255655161]
It aims to develop reduced-order/surrogate models for turbulent flow simulations using Machine Learning.
Different model structures are analyzed, with U-NET structures performing better than the standard FNO in accuracy and stability.
arXiv Detail & Related papers (2023-07-25T14:09:53Z) - Human Trajectory Forecasting with Explainable Behavioral Uncertainty [63.62824628085961]
Human trajectory forecasting helps to understand and predict human behaviors, enabling applications from social robots to self-driving cars.
Model-free methods offer superior prediction accuracy but lack explainability, while model-based methods provide explainability but cannot predict well.
We show that BNSP-SFM achieves up to a 50% improvement in prediction accuracy, compared with 11 state-of-the-art methods.
arXiv Detail & Related papers (2023-07-04T16:45:21Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Selecting Robust Features for Machine Learning Applications using
Multidata Causal Discovery [7.8814500102882805]
We introduce a Multidata causal feature selection approach that simultaneously processes an ensemble of time series datasets.
This approach uses the causal discovery algorithms PC1 or PCMCI that are implemented in the Tigramite Python package.
We apply our framework to the statistical intensity prediction of Western Pacific Tropical Cyclones.
arXiv Detail & Related papers (2023-04-11T15:43:34Z) - Estimating oil recovery factor using machine learning: Applications of
XGBoost classification [0.0]
In petroleum engineering, it is essential to determine the ultimate recovery factor, RF, particularly before exploitation and exploration.
We, therefore, applied machine learning (ML), using readily available features, to estimate oil RF for ten classes defined in this study.
arXiv Detail & Related papers (2022-10-28T18:21:25Z) - Transformer Uncertainty Estimation with Hierarchical Stochastic
Attention [8.95459272947319]
We propose a novel way to enable transformers to have the capability of uncertainty estimation.
This is achieved by learning a hierarchical self-attention that attends to values and a set of learnable centroids.
We empirically evaluate our model on two text classification tasks with both in-domain (ID) and out-of-domain (OOD) datasets.
arXiv Detail & Related papers (2021-12-27T16:43:31Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Beyond Point Estimate: Inferring Ensemble Prediction Variation from
Neuron Activation Strength in Recommender Systems [21.392694985689083]
Ensemble method is one state-of-the-art benchmark for prediction uncertainty estimation.
We observe that prediction variations come from various randomness sources.
We propose to infer prediction variation from neuron activation strength and demonstrate the strong prediction power from activation strength features.
arXiv Detail & Related papers (2020-08-17T00:08:27Z) - Superiority of Simplicity: A Lightweight Model for Network Device
Workload Prediction [58.98112070128482]
We propose a lightweight solution for series prediction based on historic observations.
It consists of a heterogeneous ensemble method composed of two models - a neural network and a mean predictor.
It achieves an overall $R2$ score of 0.10 on the available FedCSIS 2020 challenge dataset.
arXiv Detail & Related papers (2020-07-07T15:44:16Z) - Assessing Graph-based Deep Learning Models for Predicting Flash Point [52.931492216239995]
Graph-based deep learning (GBDL) models were implemented in predicting flash point for the first time.
Average R2 and Mean Absolute Error (MAE) scores of MPNN are, respectively, 2.3% lower and 2.0 K higher than previous comparable studies.
arXiv Detail & Related papers (2020-02-26T06:10:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.