Beyond Average Performance -- exploring regions of deviating performance
for black box classification models
- URL: http://arxiv.org/abs/2109.08216v1
- Date: Thu, 16 Sep 2021 20:46:52 GMT
- Title: Beyond Average Performance -- exploring regions of deviating performance
for black box classification models
- Authors: Luis Torgo and Paulo Azevedo and Ines Areosa
- Abstract summary: We describe two approaches that can be used to provide interpretable descriptions of the expected performance of any black box classification model.
These approaches are of high practical relevance as they provide means to uncover and describe in an interpretable way situations where the models are expected to have a performance that deviates significantly from their average behaviour.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Machine learning models are becoming increasingly popular in different types
of settings. This is mainly caused by their ability to achieve a level of
predictive performance that is hard to match by human experts in this new era
of big data. With this usage growth comes an increase of the requirements for
accountability and understanding of the models' predictions. However, the
degree of sophistication of the most successful models (e.g. ensembles, deep
learning) is becoming a large obstacle to this endeavour as these models are
essentially black boxes. In this paper we describe two general approaches that
can be used to provide interpretable descriptions of the expected performance
of any black box classification model. These approaches are of high practical
relevance as they provide means to uncover and describe in an interpretable way
situations where the models are expected to have a performance that deviates
significantly from their average behaviour. This may be of critical relevance
for applications where costly decisions are driven by the predictions of the
models, as it can be used to warn end users against the usage of the models in
some specific cases.
Related papers
- AALF: Almost Always Linear Forecasting [3.336367986372977]
We argue that simple models are good enough most of the time, and forecasting performance can be improved by choosing a Deep Learning method only for certain predictions.
An empirical study on various real-world datasets shows that our selection methodology outperforms state-of-the-art online model selections methods in most cases.
arXiv Detail & Related papers (2024-09-16T10:13:09Z) - Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - A roadmap to fair and trustworthy prediction model validation in
healthcare [2.476158303361112]
A prediction model is most useful if it generalizes beyond the development data.
We propose a roadmap that facilitates the development and application of reliable, fair, and trustworthy artificial intelligence prediction models.
arXiv Detail & Related papers (2023-04-07T04:24:19Z) - Operationalizing Specifications, In Addition to Test Sets for Evaluating
Constrained Generative Models [17.914521288548844]
We argue that the scale of generative models could be exploited to raise the abstraction level at which evaluation itself is conducted.
Our recommendations are based on leveraging specifications as a powerful instrument to evaluate generation quality.
arXiv Detail & Related papers (2022-11-19T06:39:43Z) - Investigating Ensemble Methods for Model Robustness Improvement of Text
Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases.
By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv Detail & Related papers (2022-10-28T17:52:10Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Thief, Beware of What Get You There: Towards Understanding Model
Extraction Attack [13.28881502612207]
In some scenarios, AI models are trained proprietarily, where neither pre-trained models nor sufficient in-distribution data is publicly available.
We find the effectiveness of existing techniques significantly affected by the absence of pre-trained models.
We formulate model extraction attacks into an adaptive framework that captures these factors with deep reinforcement learning.
arXiv Detail & Related papers (2021-04-13T03:46:59Z) - Design of Dynamic Experiments for Black-Box Model Discrimination [72.2414939419588]
Consider a dynamic model discrimination setting where we wish to chose: (i) what is the best mechanistic, time-varying model and (ii) what are the best model parameter estimates.
For rival mechanistic models where we have access to gradient information, we extend existing methods to incorporate a wider range of problem uncertainty.
We replace these black-box models with Gaussian process surrogate models and thereby extend the model discrimination setting to additionally incorporate rival black-box model.
arXiv Detail & Related papers (2021-02-07T11:34:39Z) - Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual
Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms.
We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance.
We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.