Matbench Discovery -- A framework to evaluate machine learning crystal
stability predictions
- URL: http://arxiv.org/abs/2308.14920v2
- Date: Sun, 4 Feb 2024 17:12:47 GMT
- Title: Matbench Discovery -- A framework to evaluate machine learning crystal
stability predictions
- Authors: Janosh Riebesell, Rhys E. A. Goodall, Philipp Benner, Yuan Chiang,
Bowen Deng, Alpha A. Lee, Anubhav Jain, Kristin A. Persson
- Abstract summary: Matbench Discovery simulates the deployment of machine learning (ML) energy models in a search for stable inorganic crystals.
We address the disconnect between (i) thermodynamic stability and formation energy and (ii) in-domain vs out-of-distribution performance.
- Score: 2.234359119457391
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Matbench Discovery simulates the deployment of machine learning (ML) energy
models in a high-throughput search for stable inorganic crystals. We address
the disconnect between (i) thermodynamic stability and formation energy and
(ii) in-domain vs out-of-distribution performance. Alongside this paper, we
publish a Python package to aid with future model submissions and a growing
online leaderboard with further insights into trade-offs between various
performance metrics. To answer the question which ML methodology performs best
at materials discovery, our initial release explores a variety of models
including random forests, graph neural networks (GNN), one-shot predictors,
iterative Bayesian optimizers and universal interatomic potentials (UIP).
Ranked best-to-worst by their test set F1 score on thermodynamic stability
prediction, we find CHGNet > M3GNet > MACE > ALIGNN > MEGNet > CGCNN > CGCNN+P
> Wrenformer > BOWSR > Voronoi tessellation fingerprints with random forest.
The top 3 models are UIPs, the winning methodology for ML-guided materials
discovery, achieving F1 scores of ~0.6 for crystal stability classification and
discovery acceleration factors (DAF) of up to 5x on the first 10k most stable
predictions compared to dummy selection from our test set. We also highlight a
sharp disconnect between commonly used global regression metrics and more
task-relevant classification metrics. Accurate regressors are susceptible to
unexpectedly high false-positive rates if those accurate predictions lie close
to the decision boundary at 0 eV/atom above the convex hull where most
materials are. Our results highlight the need to focus on classification
metrics that actually correlate with improved stability hit rate.
Related papers
- A Causal Graph-Enhanced Gaussian Process Regression for Modeling Engine-out NOx [0.0]
The objective of this paper is to develop and validate a probabilistic model to predict engine-out NOx emissions using Gaussian process regression.
We employ three variants of Gaussian process models: the first with a standard radial basis function kernel with input window, the second incorporating a deep kernel using convolutional neural networks to capture temporal dependencies, and the third enriching the deep kernel with a causal graph derived via graph convolutional networks.
All models are compared against a virtual ECM sensor using both quantitative and qualitative metrics. We conclude that our model provides an improvement in predictive performance when using an input window and a deep kernel structure.
arXiv Detail & Related papers (2024-10-24T04:23:57Z) - Enhancing Microgrid Performance Prediction with Attention-based Deep Learning Models [0.0]
This research aims to address microgrid systems' operational challenges, characterized by power oscillations that contribute to grid instability.
An integrated strategy is proposed, leveraging the strengths of convolutional and Gated Recurrent Unit (GRU) layers.
The framework is anchored by a Multi-Layer Perceptron (MLP) model, which is tasked with comprehensive load forecasting.
arXiv Detail & Related papers (2024-07-20T21:24:11Z) - Comparing Hyper-optimized Machine Learning Models for Predicting Efficiency Degradation in Organic Solar Cells [39.847063110051245]
This work presents a set of optimal machine learning (ML) models to represent the temporal degradation suffered by the power conversion efficiency (PCE) of organic solar cells (OSCs)
We generated a database with 996 entries, which includes up to 7 variables regarding both the manufacturing process and environmental conditions for more than 180 days.
The accuracy achieved reaches values of the coefficient determination (R2) widely exceeding 0.90, whereas the root mean squared error (RMSE), sum of squared error (SSE), and mean absolute error (MAE)>1% of the target value, the PCE.
arXiv Detail & Related papers (2024-03-29T22:05:26Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Towards Long-Term predictions of Turbulence using Neural Operators [68.8204255655161]
It aims to develop reduced-order/surrogate models for turbulent flow simulations using Machine Learning.
Different model structures are analyzed, with U-NET structures performing better than the standard FNO in accuracy and stability.
arXiv Detail & Related papers (2023-07-25T14:09:53Z) - Estimating oil recovery factor using machine learning: Applications of
XGBoost classification [0.0]
In petroleum engineering, it is essential to determine the ultimate recovery factor, RF, particularly before exploitation and exploration.
We, therefore, applied machine learning (ML), using readily available features, to estimate oil RF for ten classes defined in this study.
arXiv Detail & Related papers (2022-10-28T18:21:25Z) - Batch-Ensemble Stochastic Neural Networks for Out-of-Distribution
Detection [55.028065567756066]
Out-of-distribution (OOD) detection has recently received much attention from the machine learning community due to its importance in deploying machine learning models in real-world applications.
In this paper we propose an uncertainty quantification approach by modelling the distribution of features.
We incorporate an efficient ensemble mechanism, namely batch-ensemble, to construct the batch-ensemble neural networks (BE-SNNs) and overcome the feature collapse problem.
We show that BE-SNNs yield superior performance on several OOD benchmarks, such as the Two-Moons dataset, the FashionMNIST vs MNIST dataset, FashionM
arXiv Detail & Related papers (2022-06-26T16:00:22Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Towards More Fine-grained and Reliable NLP Performance Prediction [85.78131503006193]
We make two contributions to improving performance prediction for NLP tasks.
First, we examine performance predictors for holistic measures of accuracy like F1 or BLEU.
Second, we propose methods to understand the reliability of a performance prediction model from two angles: confidence intervals and calibration.
arXiv Detail & Related papers (2021-02-10T15:23:20Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z) - AIBench Training: Balanced Industry-Standard AI Training Benchmarking [26.820244556465333]
Earlier-stage evaluations of a new AI architecture/system need affordable benchmarks.
We use real-world benchmarks to cover the factors space that impacts the learning dynamics.
We contribute by far the most comprehensive AI training benchmark suite.
arXiv Detail & Related papers (2020-04-30T11:08:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.