Comparative Evaluation of Applicability Domain Definition Methods for Regression Models
- URL: http://arxiv.org/abs/2411.00920v1
- Date: Fri, 01 Nov 2024 14:12:57 GMT
- Title: Comparative Evaluation of Applicability Domain Definition Methods for Regression Models
- Authors: Shakir Khurshid, Bharath Kumar Loganathan, Matthieu Duvinage,
- Abstract summary: Applicability domain refers to the range of data for which the prediction of a predictive model is expected to be reliable and accurate.
We propose a novel approach based on non-deterministic Bayesian neural networks to define the applicability domain of the model.
- Score: 0.0
- License:
- Abstract: The applicability domain refers to the range of data for which the prediction of the predictive model is expected to be reliable and accurate and using a model outside its applicability domain can lead to incorrect results. The ability to define the regions in data space where a predictive model can be safely used is a necessary condition for having safer and more reliable predictions to assure the reliability of new predictions. However, defining the applicability domain of a model is a challenging problem, as there is no clear and universal definition or metric for it. This work aims to make the applicability domain more quantifiable and pragmatic. Eight applicability domain detection techniques were applied to seven regression models, trained on five different datasets, and their performance was benchmarked using a validation framework. We also propose a novel approach based on non-deterministic Bayesian neural networks to define the applicability domain of the model. Our method exhibited superior accuracy in defining the Applicability Domain compared to previous methods, highlighting its potential in this regard.
Related papers
- Source-Free Domain-Invariant Performance Prediction [68.39031800809553]
We propose a source-free approach centred on uncertainty-based estimation, using a generative model for calibration in the absence of source data.
Our experiments on benchmark object recognition datasets reveal that existing source-based methods fall short with limited source sample availability.
Our approach significantly outperforms the current state-of-the-art source-free and source-based methods, affirming its effectiveness in domain-invariant performance estimation.
arXiv Detail & Related papers (2024-08-05T03:18:58Z) - Determining Domain of Machine Learning Models using Kernel Density Estimates: Applications in Materials Property Prediction [1.8551396341435895]
We develop a new approach of assessing model domain using kernel density estimation.
We show that chemical groups considered unrelated based on established chemical knowledge exhibit significant dissimilarities by our measure.
High measures of dissimilarity are associated with poor model performance and poor estimates of model uncertainty.
arXiv Detail & Related papers (2024-05-28T15:41:16Z) - Source-Free Unsupervised Domain Adaptation with Hypothesis Consolidation
of Prediction Rationale [53.152460508207184]
Source-Free Unsupervised Domain Adaptation (SFUDA) is a challenging task where a model needs to be adapted to a new domain without access to target domain labels or source domain data.
This paper proposes a novel approach that considers multiple prediction hypotheses for each sample and investigates the rationale behind each hypothesis.
To achieve the optimal performance, we propose a three-step adaptation process: model pre-adaptation, hypothesis consolidation, and semi-supervised learning.
arXiv Detail & Related papers (2024-02-02T05:53:22Z) - Finite Sample Confidence Regions for Linear Regression Parameters Using
Arbitrary Predictors [1.6860963320038902]
We explore a novel methodology for constructing confidence regions for parameters of linear models, using predictions from any arbitrary predictor.
The derived confidence regions can be cast as constraints within a Mixed Linear Programming framework, enabling optimisation of linear objectives.
Unlike previous methods, the confidence region can be empty, which can be used for hypothesis testing.
arXiv Detail & Related papers (2024-01-27T00:15:48Z) - Outlier-Based Domain of Applicability Identification for Materials
Property Prediction Models [0.38073142980733]
We propose a method to find domains of applicability using a large feature space and also introduce analysis techniques to gain more insight into the detected domains.
In this work, we propose a method to find domains of applicability using a large feature space and also introduce analysis techniques to gain more insight into the detected domains.
arXiv Detail & Related papers (2023-01-17T07:51:12Z) - Uncertainty Quantification for Rule-Based Models [0.03807314298073299]
Rule-based classification models directly predict values, rather than modeling a probability and translating it into a prediction as done in statistical models.
We propose an uncertainty quantification framework in the form of a meta-model that takes any binary classifier with binary output as a black box and estimates the prediction accuracy of that base model at a given input along with a level of confidence on that estimation.
arXiv Detail & Related papers (2022-11-03T15:50:09Z) - Variational Model Perturbation for Source-Free Domain Adaptation [64.98560348412518]
We introduce perturbations into the model parameters by variational Bayesian inference in a probabilistic framework.
We demonstrate the theoretical connection to learning Bayesian neural networks, which proves the generalizability of the perturbed model to target domains.
arXiv Detail & Related papers (2022-10-19T08:41:19Z) - Uncertainty-guided Source-free Domain Adaptation [77.3844160723014]
Source-free domain adaptation (SFDA) aims to adapt a classifier to an unlabelled target data set by only using a pre-trained source model.
We propose quantifying the uncertainty in the source model predictions and utilizing it to guide the target adaptation.
arXiv Detail & Related papers (2022-08-16T08:03:30Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.