From Prediction to Action: Critical Role of Performance Estimation for
Machine-Learning-Driven Materials Discovery
- URL: http://arxiv.org/abs/2311.15549v2
- Date: Thu, 7 Dec 2023 02:08:13 GMT
- Title: From Prediction to Action: Critical Role of Performance Estimation for
Machine-Learning-Driven Materials Discovery
- Authors: Mario Boley and Felix Luong and Simon Teshuva and Daniel F Schmidt and
Lucas Foppa and Matthias Scheffler
- Abstract summary: We argue that the lack of proper performance estimation methods from pre-computed data collections is a fundamental problem for improving data-driven materials discovery.
We propose a novel such estimator that, in contrast to na"ive reward estimation, successfully predicts Gaussian processes with the "expected improvement" acquisition function.
- Score: 2.3243389656894595
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Materials discovery driven by statistical property models is an iterative
decision process, during which an initial data collection is extended with new
data proposed by a model-informed acquisition function--with the goal to
maximize a certain "reward" over time, such as the maximum property value
discovered so far. While the materials science community achieved much progress
in developing property models that predict well on average with respect to the
training distribution, this form of in-distribution performance measurement is
not directly coupled with the discovery reward. This is because an iterative
discovery process has a shifting reward distribution that is
over-proportionally determined by the model performance for exceptional
materials. We demonstrate this problem using the example of bulk modulus
maximization among double perovskite oxides. We find that the in-distribution
predictive performance suggests random forests as superior to Gaussian process
regression, while the results are inverse in terms of the discovery rewards. We
argue that the lack of proper performance estimation methods from pre-computed
data collections is a fundamental problem for improving data-driven materials
discovery, and we propose a novel such estimator that, in contrast to na\"ive
reward estimation, successfully predicts Gaussian processes with the "expected
improvement" acquisition function as the best out of four options in our
demonstrational study for double perovskites. Importantly, it does so without
requiring the over thousand ab initio computations that were needed to confirm
this prediction.
Related papers
- Learning Augmentation Policies from A Model Zoo for Time Series Forecasting [58.66211334969299]
We introduce AutoTSAug, a learnable data augmentation method based on reinforcement learning.
By augmenting the marginal samples with a learnable policy, AutoTSAug substantially improves forecasting performance.
arXiv Detail & Related papers (2024-09-10T07:34:19Z) - Ranking and Combining Latent Structured Predictive Scores without Labeled Data [2.5064967708371553]
This paper introduces a novel structured unsupervised ensemble learning model (SUEL)
It exploits the dependency between a set of predictors with continuous predictive scores, rank the predictors without labeled data and combine them to an ensembled score with weights.
The efficacy of the proposed methods is rigorously assessed through both simulation studies and real-world application of risk genes discovery.
arXiv Detail & Related papers (2024-08-14T20:14:42Z) - Source-Free Domain-Invariant Performance Prediction [68.39031800809553]
We propose a source-free approach centred on uncertainty-based estimation, using a generative model for calibration in the absence of source data.
Our experiments on benchmark object recognition datasets reveal that existing source-based methods fall short with limited source sample availability.
Our approach significantly outperforms the current state-of-the-art source-free and source-based methods, affirming its effectiveness in domain-invariant performance estimation.
arXiv Detail & Related papers (2024-08-05T03:18:58Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Performative Prediction with Bandit Feedback: Learning through Reparameterization [23.039885534575966]
performative prediction is a framework for studying social prediction in which the data distribution itself changes in response to the deployment of a model.
We develop a reparametization that reparametrizes the performative prediction objective as a function of induced data distribution.
arXiv Detail & Related papers (2023-05-01T21:31:29Z) - Functional Ensemble Distillation [18.34081591772928]
We investigate how to best distill an ensemble's predictions using an efficient model.
We find that learning the distilled model via a simple augmentation scheme in the form of mixup augmentation significantly boosts the performance.
arXiv Detail & Related papers (2022-06-05T14:07:17Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Back2Future: Leveraging Backfill Dynamics for Improving Real-time
Predictions in Future [73.03458424369657]
In real-time forecasting in public health, data collection is a non-trivial and demanding task.
'Backfill' phenomenon and its effect on model performance has been barely studied in the prior literature.
We formulate a novel problem and neural framework Back2Future that aims to refine a given model's predictions in real-time.
arXiv Detail & Related papers (2021-06-08T14:48:20Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Gaussian Process Boosting [13.162429430481982]
We introduce a novel way to combine boosting with Gaussian process and mixed effects models.
We obtain increased prediction accuracy compared to existing approaches on simulated and real-world data sets.
arXiv Detail & Related papers (2020-04-06T13:19:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.