Learning Global Transparent Models Consistent with Local Contrastive
Explanations
- URL: http://arxiv.org/abs/2002.08247v4
- Date: Thu, 29 Oct 2020 00:34:34 GMT
- Title: Learning Global Transparent Models Consistent with Local Contrastive
Explanations
- Authors: Tejaswini Pedapati, Avinash Balakrishnan, Karthikeyan Shanmugam and
Amit Dhurandhar
- Abstract summary: We create custom features from sparse local contrastive explanations of the black-box model and then train a globally transparent model on just these.
Based on a key insight we propose a novel method where we create custom features from sparse local contrastive explanations of the black-box model and then train a globally transparent model on just these.
- Score: 34.86847988157447
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a rich and growing literature on producing local
contrastive/counterfactual explanations for black-box models (e.g. neural
networks).
In these methods, for an input, an explanation is in the form of a contrast
point differing in very few features from the original input and lying in a
different class. Other works try to build globally interpretable models like
decision trees and rule lists based on the data using actual labels or based on
the black-box models predictions. Although these interpretable global models
can be useful, they may not be consistent with local explanations from a
specific black-box of choice. In this work, we explore the question: Can we
produce a transparent global model that is simultaneously accurate and
consistent with the local (contrastive) explanations of the black-box model? We
introduce a natural local consistency metric that quantifies if the local
explanations and predictions of the black-box model are also consistent with
the proxy global transparent model. Based on a key insight we propose a novel
method where we create custom boolean features from sparse local contrastive
explanations of the black-box model and then train a globally transparent model
on just these, and showcase empirically that such models have higher local
consistency compared with other known strategies, while still being close in
performance to models that are trained with access to the original data.
Related papers
- Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering [46.823415680462844]
We study the possibility of selective prediction for vision-language models in a realistic, black-box setting.
We propose using the principle of textitneighborhood consistency to identify unreliable responses from a black-box vision-language model in question answering tasks.
arXiv Detail & Related papers (2024-04-16T00:28:26Z) - Discriminative Feature Attributions: Bridging Post Hoc Explainability
and Inherent Interpretability [29.459228981179674]
Post hoc explanations incorrectly attribute high importance to features that are unimportant or non-discriminative for the underlying task.
Inherently interpretable models, on the other hand, circumvent these issues by explicitly encoding explanations into model architecture.
We propose Distractor Erasure Tuning (DiET), a method that adapts black-box models to be robust to distractor erasure.
arXiv Detail & Related papers (2023-07-27T17:06:02Z) - DREAM: Domain-free Reverse Engineering Attributes of Black-box Model [51.37041886352823]
We propose a new problem of Domain-agnostic Reverse Engineering the Attributes of a black-box target model.
We learn a domain-agnostic model to infer the attributes of a target black-box model with unknown training data.
arXiv Detail & Related papers (2023-07-20T16:25:58Z) - Partial Order in Chaos: Consensus on Feature Attributions in the
Rashomon Set [50.67431815647126]
Post-hoc global/local feature attribution methods are being progressively employed to understand machine learning models.
We show that partial orders of local/global feature importance arise from this methodology.
We show that every relation among features present in these partial orders also holds in the rankings provided by existing approaches.
arXiv Detail & Related papers (2021-10-26T02:53:14Z) - Can Explanations Be Useful for Calibrating Black Box Models? [31.473798197405948]
We study how to improve a black box model's performance on a new domain given examples from the new domain.
Our approach first extracts a set of features combining human intuition about the task with model attributions.
We show that the calibration features transfer to some extent between tasks and shed light on how to effectively use them.
arXiv Detail & Related papers (2021-10-14T17:48:16Z) - Visualising Deep Network's Time-Series Representations [93.73198973454944]
Despite the popularisation of machine learning models, more often than not they still operate as black boxes with no insight into what is happening inside the model.
In this paper, a method that addresses that issue is proposed, with a focus on visualising multi-dimensional time-series data.
Experiments on a high-frequency stock market dataset show that the method provides fast and discernible visualisations.
arXiv Detail & Related papers (2021-03-12T09:53:34Z) - GLocalX -- From Local to Global Explanations of Black Box AI Models [12.065358125757847]
We present GLocalX, a "local-first" model agnostic explanation method.
Our goal is to learn accurate yet simple interpretable models to emulate the given black box, and, if possible, replace it entirely.
arXiv Detail & Related papers (2021-01-19T15:26:09Z) - VAE-LIME: Deep Generative Model Based Approach for Local Data-Driven
Model Interpretability Applied to the Ironmaking Industry [70.10343492784465]
It is necessary to expose to the process engineer, not solely the model predictions, but also their interpretability.
Model-agnostic local interpretability solutions based on LIME have recently emerged to improve the original method.
We present in this paper a novel approach, VAE-LIME, for local interpretability of data-driven models forecasting the temperature of the hot metal produced by a blast furnace.
arXiv Detail & Related papers (2020-07-15T07:07:07Z) - Interpretable Companions for Black-Box Models [13.39487972552112]
We present an interpretable companion model for any pre-trained black-box classifiers.
For any input, a user can decide to either receive a prediction from the black-box model, with high accuracy but no explanations, or employ a companion rule to obtain an interpretable prediction with slightly lower accuracy.
The companion model is trained from data and the predictions of the black-box model, with the objective combining area under the transparency--accuracy curve and model complexity.
arXiv Detail & Related papers (2020-02-10T01:39:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.