Interpretable Companions for Black-Box Models
- URL: http://arxiv.org/abs/2002.03494v2
- Date: Tue, 11 Feb 2020 05:38:05 GMT
- Title: Interpretable Companions for Black-Box Models
- Authors: Danqing Pan, Tong Wang, Satoshi Hara
- Abstract summary: We present an interpretable companion model for any pre-trained black-box classifiers.
For any input, a user can decide to either receive a prediction from the black-box model, with high accuracy but no explanations, or employ a companion rule to obtain an interpretable prediction with slightly lower accuracy.
The companion model is trained from data and the predictions of the black-box model, with the objective combining area under the transparency--accuracy curve and model complexity.
- Score: 13.39487972552112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an interpretable companion model for any pre-trained black-box
classifiers. The idea is that for any input, a user can decide to either
receive a prediction from the black-box model, with high accuracy but no
explanations, or employ a companion rule to obtain an interpretable prediction
with slightly lower accuracy. The companion model is trained from data and the
predictions of the black-box model, with the objective combining area under the
transparency--accuracy curve and model complexity. Our model provides flexible
choices for practitioners who face the dilemma of choosing between always using
interpretable models and always using black-box models for a predictive task,
so users can, for any given input, take a step back to resort to an
interpretable prediction if they find the predictive performance satisfying, or
stick to the black-box model if the rules are unsatisfying. To show the value
of companion models, we design a human evaluation on more than a hundred people
to investigate the tolerable accuracy loss to gain interpretability for humans.
Related papers
- Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering [46.823415680462844]
We study the possibility of selective prediction for vision-language models in a realistic, black-box setting.
We propose using the principle of textitneighborhood consistency to identify unreliable responses from a black-box vision-language model in question answering tasks.
arXiv Detail & Related papers (2024-04-16T00:28:26Z) - Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Design of Dynamic Experiments for Black-Box Model Discrimination [72.2414939419588]
Consider a dynamic model discrimination setting where we wish to chose: (i) what is the best mechanistic, time-varying model and (ii) what are the best model parameter estimates.
For rival mechanistic models where we have access to gradient information, we extend existing methods to incorporate a wider range of problem uncertainty.
We replace these black-box models with Gaussian process surrogate models and thereby extend the model discrimination setting to additionally incorporate rival black-box model.
arXiv Detail & Related papers (2021-02-07T11:34:39Z) - Augmented Fairness: An Interpretable Model Augmenting Decision-Makers'
Fairness [10.53972370889201]
We propose a model-agnostic approach for mitigating the prediction bias of a black-box decision-maker.
Our method detects in the feature space where the black-box decision-maker is biased and replaces it with a few short decision rules, acting as a "fair surrogate"
arXiv Detail & Related papers (2020-11-17T03:25:44Z) - A Causal Lens for Peeking into Black Box Predictive Models: Predictive
Model Interpretation via Causal Attribution [3.3758186776249928]
We aim to address this problem in settings where the predictive model is a black box.
We reduce the problem of interpreting a black box predictive model to that of estimating the causal effects of each of the model inputs on the model output.
We show how the resulting causal attribution of responsibility for model output to the different model inputs can be used to interpret the predictive model and to explain its predictions.
arXiv Detail & Related papers (2020-08-01T23:20:57Z) - Are Visual Explanations Useful? A Case Study in Model-in-the-Loop
Prediction [49.254162397086006]
We study explanations based on visual saliency in an image-based age prediction task.
We find that presenting model predictions improves human accuracy.
However, explanations of various kinds fail to significantly alter human accuracy or trust in the model.
arXiv Detail & Related papers (2020-07-23T20:39:40Z) - Concept Bottleneck Models [79.91795150047804]
State-of-the-art models today do not typically support the manipulation of concepts like "the existence of bone spurs"
We revisit the classic idea of first predicting concepts that are provided at training time, and then using these concepts to predict the label.
On x-ray grading and bird identification, concept bottleneck models achieve competitive accuracy with standard end-to-end models.
arXiv Detail & Related papers (2020-07-09T07:47:28Z) - In Pursuit of Interpretable, Fair and Accurate Machine Learning for
Criminal Recidivism Prediction [19.346391120556884]
This study trains interpretable models that output probabilities rather than binary predictions, and uses quantitative fairness definitions to assess the models.
We generated black-box and interpretable ML models on two different criminal recidivism datasets from Florida and Kentucky.
Several interpretable ML models can predict recidivism as well as black-box ML models and are more accurate than COMPAS or the Arnold PSA.
arXiv Detail & Related papers (2020-05-08T17:16:31Z) - Learning Global Transparent Models Consistent with Local Contrastive
Explanations [34.86847988157447]
We create custom features from sparse local contrastive explanations of the black-box model and then train a globally transparent model on just these.
Based on a key insight we propose a novel method where we create custom features from sparse local contrastive explanations of the black-box model and then train a globally transparent model on just these.
arXiv Detail & Related papers (2020-02-19T15:45:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.