Outlier-Based Domain of Applicability Identification for Materials
Property Prediction Models
- URL: http://arxiv.org/abs/2302.06454v1
- Date: Tue, 17 Jan 2023 07:51:12 GMT
- Title: Outlier-Based Domain of Applicability Identification for Materials
Property Prediction Models
- Authors: Gihan Panapitiya and Emily Saldanha
- Abstract summary: We propose a method to find domains of applicability using a large feature space and also introduce analysis techniques to gain more insight into the detected domains.
In this work, we propose a method to find domains of applicability using a large feature space and also introduce analysis techniques to gain more insight into the detected domains.
- Score: 0.38073142980733
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning models have been widely applied for material property
prediction. However, practical application of these models can be hindered by a
lack of information about how well they will perform on previously unseen types
of materials. Because machine learning model predictions depend on the quality
of the available training data, different domains of the material feature space
are predicted with different accuracy levels by such models. The ability to
identify such domains enables the ability to find the confidence level of each
prediction, to determine when and how the model should be employed depending on
the prediction accuracy requirements of different tasks, and to improve the
model for domains with high errors. In this work, we propose a method to find
domains of applicability using a large feature space and also introduce
analysis techniques to gain more insight into the detected domains and
subdomains.
Related papers
- Comparative Evaluation of Applicability Domain Definition Methods for Regression Models [0.0]
Applicability domain refers to the range of data for which the prediction of a predictive model is expected to be reliable and accurate.
We propose a novel approach based on non-deterministic Bayesian neural networks to define the applicability domain of the model.
arXiv Detail & Related papers (2024-11-01T14:12:57Z) - Learning to Generalize Unseen Domains via Multi-Source Meta Learning for Text Classification [71.08024880298613]
We study the multi-source Domain Generalization of text classification.
We propose a framework to use multiple seen domains to train a model that can achieve high accuracy in an unseen domain.
arXiv Detail & Related papers (2024-09-20T07:46:21Z) - Determining Domain of Machine Learning Models using Kernel Density Estimates: Applications in Materials Property Prediction [1.8551396341435895]
We develop a new approach of assessing model domain using kernel density estimation.
We show that chemical groups considered unrelated based on established chemical knowledge exhibit significant dissimilarities by our measure.
High measures of dissimilarity are associated with poor model performance and poor estimates of model uncertainty.
arXiv Detail & Related papers (2024-05-28T15:41:16Z) - Prospector Heads: Generalized Feature Attribution for Large Models & Data [82.02696069543454]
We introduce prospector heads, an efficient and interpretable alternative to explanation-based attribution methods.
We demonstrate how prospector heads enable improved interpretation and discovery of class-specific patterns in input data.
arXiv Detail & Related papers (2024-02-18T23:01:28Z) - Materials Informatics Transformer: A Language Model for Interpretable
Materials Properties Prediction [6.349503549199403]
We introduce our model Materials Informatics Transformer (MatInFormer) for material property prediction.
Specifically, we introduce a novel approach that involves learning the grammar of crystallography through the tokenization of pertinent space group information.
arXiv Detail & Related papers (2023-08-30T18:34:55Z) - Entity Aware Modelling: A Survey [22.32009539611539]
Recent machine learning advances have led to new state-of-the-art response prediction models.
Models built at a population level often lead to sub-optimal performance in many personalized prediction settings.
In personalized prediction, the goal is to incorporate inherent characteristics of different entities to improve prediction performance.
arXiv Detail & Related papers (2023-02-16T16:33:33Z) - Assessing Out-of-Domain Language Model Performance from Few Examples [38.245449474937914]
We address the task of predicting out-of-domain (OOD) performance in a few-shot fashion.
We benchmark the performance on this task when looking at model accuracy on the few-shot examples.
We show that attribution-based factors can help rank relative model OOD performance.
arXiv Detail & Related papers (2022-10-13T04:45:26Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Spatial machine-learning model diagnostics: a model-agnostic
distance-based approach [91.62936410696409]
This contribution proposes spatial prediction error profiles (SPEPs) and spatial variable importance profiles (SVIPs) as novel model-agnostic assessment and interpretation tools.
The SPEPs and SVIPs of geostatistical methods, linear models, random forest, and hybrid algorithms show striking differences and also relevant similarities.
The novel diagnostic tools enrich the toolkit of spatial data science, and may improve ML model interpretation, selection, and design.
arXiv Detail & Related papers (2021-11-13T01:50:36Z) - Model-agnostic multi-objective approach for the evolutionary discovery
of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results.
We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z) - Transformer Based Multi-Source Domain Adaptation [53.24606510691877]
In practical machine learning settings, the data on which a model must make predictions often come from a different distribution than the data it was trained on.
Here, we investigate the problem of unsupervised multi-source domain adaptation, where a model is trained on labelled data from multiple source domains and must make predictions on a domain for which no labelled data has been seen.
We show that the predictions of large pretrained transformer based domain experts are highly homogenous, making it challenging to learn effective functions for mixing their predictions.
arXiv Detail & Related papers (2020-09-16T16:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.