Explainers in the Wild: Making Surrogate Explainers Robust to
Distortions through Perception
- URL: http://arxiv.org/abs/2102.10951v1
- Date: Mon, 22 Feb 2021 12:38:53 GMT
- Title: Explainers in the Wild: Making Surrogate Explainers Robust to
Distortions through Perception
- Authors: Alexander Hepburn, Raul Santos-Rodriguez
- Abstract summary: We propose a methodology to evaluate the effect of distortions in explanations by embedding perceptual distances.
We generate explanations for images in the Imagenet-C dataset and demonstrate how using a perceptual distances in the surrogate explainer creates more coherent explanations for the distorted and reference images.
- Score: 77.34726150561087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Explaining the decisions of models is becoming pervasive in the image
processing domain, whether it is by using post-hoc methods or by creating
inherently interpretable models. While the widespread use of surrogate
explainers is a welcome addition to inspect and understand black-box models,
assessing the robustness and reliability of the explanations is key for their
success. Additionally, whilst existing work in the explainability field
proposes various strategies to address this problem, the challenges of working
with data in the wild is often overlooked. For instance, in image
classification, distortions to images can not only affect the predictions
assigned by the model, but also the explanation. Given a clean and a distorted
version of an image, even if the prediction probabilities are similar, the
explanation may still be different. In this paper we propose a methodology to
evaluate the effect of distortions in explanations by embedding perceptual
distances that tailor the neighbourhoods used to training surrogate explainers.
We also show that by operating in this way, we can make the explanations more
robust to distortions. We generate explanations for images in the Imagenet-C
dataset and demonstrate how using a perceptual distances in the surrogate
explainer creates more coherent explanations for the distorted and reference
images.
Related papers
- CNN-based explanation ensembling for dataset, representation and explanations evaluation [1.1060425537315088]
We explore the potential of ensembling explanations generated by deep classification models using convolutional model.
Through experimentation and analysis, we aim to investigate the implications of combining explanations to uncover a more coherent and reliable patterns of the model's behavior.
arXiv Detail & Related papers (2024-04-16T08:39:29Z) - Understanding Disparities in Post Hoc Machine Learning Explanation [2.965442487094603]
Previous work has highlighted that existing post-hoc explanation methods exhibit disparities in explanation fidelity (across 'race' and 'gender' as sensitive attributes)
We specifically assess challenges to explanation disparities that originate from properties of the data.
Results indicate that disparities in model explanations can also depend on data and model properties.
arXiv Detail & Related papers (2024-01-25T22:09:28Z) - Counterfactuals of Counterfactuals: a back-translation-inspired approach
to analyse counterfactual editors [3.4253416336476246]
We focus on the analysis of counterfactual, contrastive explanations.
We propose a new back translation-inspired evaluation methodology.
We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models.
arXiv Detail & Related papers (2023-05-26T16:04:28Z) - OCTET: Object-aware Counterfactual Explanations [29.532969342297086]
We propose an object-centric framework for counterfactual explanation generation.
Our method, inspired by recent generative modeling works, encodes the query image into a latent space that is structured to ease object-level manipulations.
We conduct a set of experiments on counterfactual explanation benchmarks for driving scenes, and we show that our method can be adapted beyond classification.
arXiv Detail & Related papers (2022-11-22T16:23:12Z) - Sampling Based On Natural Image Statistics Improves Local Surrogate
Explainers [111.31448606885672]
Surrogate explainers are a popular post-hoc interpretability method to further understand how a model arrives at a prediction.
We propose two approaches to do so, namely (1) altering the method for sampling the local neighbourhood and (2) using perceptual metrics to convey some of the properties of the distribution of natural images.
arXiv Detail & Related papers (2022-08-08T08:10:13Z) - Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data.
We find that people often mis-interpret the explanations.
We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z) - STEEX: Steering Counterfactual Explanations with Semantics [28.771471624014065]
Deep learning models are increasingly used in safety-critical applications.
For simple images, such as low-resolution face portraits, visual counterfactual explanations has recently been proposed.
We propose a new generative counterfactual explanation framework that produces plausible and sparse modifications.
arXiv Detail & Related papers (2021-11-17T13:20:29Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - This is not the Texture you are looking for! Introducing Novel
Counterfactual Explanations for Non-Experts using Generative Adversarial
Learning [59.17685450892182]
counterfactual explanation systems try to enable a counterfactual reasoning by modifying the input image.
We present a novel approach to generate such counterfactual image explanations based on adversarial image-to-image translation techniques.
Our results show that our approach leads to significantly better results regarding mental models, explanation satisfaction, trust, emotions, and self-efficacy than two state-of-the art systems.
arXiv Detail & Related papers (2020-12-22T10:08:05Z) - Model extraction from counterfactual explanations [68.8204255655161]
We show how an adversary can leverage the information provided by counterfactual explanations to build high-fidelity and high-accuracy model extraction attacks.
Our attack enables the adversary to build a faithful copy of a target model by accessing its counterfactual explanations.
arXiv Detail & Related papers (2020-09-03T19:02:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.