Related papers: Matbench Discovery -- A framework to evaluate machine learning crystal stability predictions

Matbench Discovery -- A framework to evaluate machine learning crystal stability predictions

URL: http://arxiv.org/abs/2308.14920v2
Date: Sun, 4 Feb 2024 17:12:47 GMT
Title: Matbench Discovery -- A framework to evaluate machine learning crystal stability predictions
Authors: Janosh Riebesell, Rhys E. A. Goodall, Philipp Benner, Yuan Chiang, Bowen Deng, Alpha A. Lee, Anubhav Jain, Kristin A. Persson
Abstract summary: Matbench Discovery simulates the deployment of machine learning (ML) energy models in a search for stable inorganic crystals. We address the disconnect between (i) thermodynamic stability and formation energy and (ii) in-domain vs out-of-distribution performance.
Score: 2.234359119457391
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Matbench Discovery simulates the deployment of machine learning (ML) energy models in a high-throughput search for stable inorganic crystals. We address the disconnect between (i) thermodynamic stability and formation energy and (ii) in-domain vs out-of-distribution performance. Alongside this paper, we publish a Python package to aid with future model submissions and a growing online leaderboard with further insights into trade-offs between various performance metrics. To answer the question which ML methodology performs best at materials discovery, our initial release explores a variety of models including random forests, graph neural networks (GNN), one-shot predictors, iterative Bayesian optimizers and universal interatomic potentials (UIP). Ranked best-to-worst by their test set F1 score on thermodynamic stability prediction, we find CHGNet > M3GNet > MACE > ALIGNN > MEGNet > CGCNN > CGCNN+P > Wrenformer > BOWSR > Voronoi tessellation fingerprints with random forest. The top 3 models are UIPs, the winning methodology for ML-guided materials discovery, achieving F1 scores of ~0.6 for crystal stability classification and discovery acceleration factors (DAF) of up to 5x on the first 10k most stable predictions compared to dummy selection from our test set. We also highlight a sharp disconnect between commonly used global regression metrics and more task-relevant classification metrics. Accurate regressors are susceptible to unexpectedly high false-positive rates if those accurate predictions lie close to the decision boundary at 0 eV/atom above the convex hull where most materials are. Our results highlight the need to focus on classification metrics that actually correlate with improved stability hit rate.

Related papers

A Causal Graph-Enhanced Gaussian Process Regression for Modeling Engine-out NOx [0.0]
The objective of this paper is to develop and validate a probabilistic model to predict engine-out NOx emissions using Gaussian process regression. We employ three variants of Gaussian process models: the first with a standard radial basis function kernel with input window, the second incorporating a deep kernel using convolutional neural networks to capture temporal dependencies, and the third enriching the deep kernel with a causal graph derived via graph convolutional networks. All models are compared against a virtual ECM sensor using both quantitative and qualitative metrics. We conclude that our model provides an improvement in predictive performance when using an input window and a deep kernel structure.
arXiv Detail & Related papers (2024-10-24T04:23:57Z)
Enhancing Microgrid Performance Prediction with Attention-based Deep Learning Models [0.0]
This research aims to address microgrid systems' operational challenges, characterized by power oscillations that contribute to grid instability. An integrated strategy is proposed, leveraging the strengths of convolutional and Gated Recurrent Unit (GRU) layers. The framework is anchored by a Multi-Layer Perceptron (MLP) model, which is tasked with comprehensive load forecasting.
arXiv Detail & Related papers (2024-07-20T21:24:11Z)
Comparing Hyper-optimized Machine Learning Models for Predicting Efficiency Degradation in Organic Solar Cells [39.847063110051245]
This work presents a set of optimal machine learning (ML) models to represent the temporal degradation suffered by the power conversion efficiency (PCE) of organic solar cells (OSCs) We generated a database with 996 entries, which includes up to 7 variables regarding both the manufacturing process and environmental conditions for more than 180 days. The accuracy achieved reaches values of the coefficient determination (R2) widely exceeding 0.90, whereas the root mean squared error (RMSE), sum of squared error (SSE), and mean absolute error (MAE)>1% of the target value, the PCE.
arXiv Detail & Related papers (2024-03-29T22:05:26Z)
Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS) We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises. We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z)
Towards Long-Term predictions of Turbulence using Neural Operators [68.8204255655161]
It aims to develop reduced-order/surrogate models for turbulent flow simulations using Machine Learning. Different model structures are analyzed, with U-NET structures performing better than the standard FNO in accuracy and stability.
arXiv Detail & Related papers (2023-07-25T14:09:53Z)
Human Trajectory Forecasting with Explainable Behavioral Uncertainty [63.62824628085961]
Human trajectory forecasting helps to understand and predict human behaviors, enabling applications from social robots to self-driving cars. Model-free methods offer superior prediction accuracy but lack explainability, while model-based methods provide explainability but cannot predict well. We show that BNSP-SFM achieves up to a 50% improvement in prediction accuracy, compared with 11 state-of-the-art methods.
arXiv Detail & Related papers (2023-07-04T16:45:21Z)
Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world. We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique. By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z)
Selecting Robust Features for Machine Learning Applications using Multidata Causal Discovery [7.8814500102882805]
We introduce a Multidata causal feature selection approach that simultaneously processes an ensemble of time series datasets. This approach uses the causal discovery algorithms PC1 or PCMCI that are implemented in the Tigramite Python package. We apply our framework to the statistical intensity prediction of Western Pacific Tropical Cyclones.
arXiv Detail & Related papers (2023-04-11T15:43:34Z)
Estimating oil recovery factor using machine learning: Applications of XGBoost classification [0.0]
In petroleum engineering, it is essential to determine the ultimate recovery factor, RF, particularly before exploitation and exploration. We, therefore, applied machine learning (ML), using readily available features, to estimate oil RF for ten classes defined in this study.
arXiv Detail & Related papers (2022-10-28T18:21:25Z)
Batch-Ensemble Stochastic Neural Networks for Out-of-Distribution Detection [55.028065567756066]
Out-of-distribution (OOD) detection has recently received much attention from the machine learning community due to its importance in deploying machine learning models in real-world applications. In this paper we propose an uncertainty quantification approach by modelling the distribution of features. We incorporate an efficient ensemble mechanism, namely batch-ensemble, to construct the batch-ensemble neural networks (BE-SNNs) and overcome the feature collapse problem. We show that BE-SNNs yield superior performance on several OOD benchmarks, such as the Two-Moons dataset, the FashionMNIST vs MNIST dataset, FashionM
arXiv Detail & Related papers (2022-06-26T16:00:22Z)
Transformer Uncertainty Estimation with Hierarchical Stochastic Attention [8.95459272947319]
We propose a novel way to enable transformers to have the capability of uncertainty estimation. This is achieved by learning a hierarchical self-attention that attends to values and a set of learnable centroids. We empirically evaluate our model on two text classification tasks with both in-domain (ID) and out-of-domain (OOD) datasets.
arXiv Detail & Related papers (2021-12-27T16:43:31Z)
ALT-MAS: A Data-Efficient Framework for Active Testing of Machine Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data. The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
Towards More Fine-grained and Reliable NLP Performance Prediction [85.78131503006193]
We make two contributions to improving performance prediction for NLP tasks. First, we examine performance predictors for holistic measures of accuracy like F1 or BLEU. Second, we propose methods to understand the reliability of a performance prediction model from two angles: confidence intervals and calibration.
arXiv Detail & Related papers (2021-02-10T15:23:20Z)
Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models. We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs. Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z)
Beyond Point Estimate: Inferring Ensemble Prediction Variation from Neuron Activation Strength in Recommender Systems [21.392694985689083]
Ensemble method is one state-of-the-art benchmark for prediction uncertainty estimation. We observe that prediction variations come from various randomness sources. We propose to infer prediction variation from neuron activation strength and demonstrate the strong prediction power from activation strength features.
arXiv Detail & Related papers (2020-08-17T00:08:27Z)
Superiority of Simplicity: A Lightweight Model for Network Device Workload Prediction [58.98112070128482]
We propose a lightweight solution for series prediction based on historic observations. It consists of a heterogeneous ensemble method composed of two models - a neural network and a mean predictor. It achieves an overall $R2$ score of 0.10 on the available FedCSIS 2020 challenge dataset.
arXiv Detail & Related papers (2020-07-07T15:44:16Z)
AIBench Training: Balanced Industry-Standard AI Training Benchmarking [26.820244556465333]
Earlier-stage evaluations of a new AI architecture/system need affordable benchmarks. We use real-world benchmarks to cover the factors space that impacts the learning dynamics. We contribute by far the most comprehensive AI training benchmark suite.
arXiv Detail & Related papers (2020-04-30T11:08:49Z)
Assessing Graph-based Deep Learning Models for Predicting Flash Point [52.931492216239995]
Graph-based deep learning (GBDL) models were implemented in predicting flash point for the first time. Average R2 and Mean Absolute Error (MAE) scores of MPNN are, respectively, 2.3% lower and 2.0 K higher than previous comparable studies.
arXiv Detail & Related papers (2020-02-26T06:10:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.