Explaining Image Classifiers Using Contrastive Counterfactuals in
Generative Latent Spaces
- URL: http://arxiv.org/abs/2206.05257v1
- Date: Fri, 10 Jun 2022 17:54:46 GMT
- Title: Explaining Image Classifiers Using Contrastive Counterfactuals in
Generative Latent Spaces
- Authors: Kamran Alipour, Aditya Lahiri, Ehsan Adeli, Babak Salimi, Michael
Pazzani
- Abstract summary: We introduce a novel method to generate causal and yet interpretable counterfactual explanations for image classifiers.
We use this framework to obtain contrastive and causal sufficiency and necessity scores as global explanations for black-box classifiers.
- Score: 12.514483749037998
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite their high accuracies, modern complex image classifiers cannot be
trusted for sensitive tasks due to their unknown decision-making process and
potential biases. Counterfactual explanations are very effective in providing
transparency for these black-box algorithms. Nevertheless, generating
counterfactuals that can have a consistent impact on classifier outputs and yet
expose interpretable feature changes is a very challenging task. We introduce a
novel method to generate causal and yet interpretable counterfactual
explanations for image classifiers using pretrained generative models without
any re-training or conditioning. The generative models in this technique are
not bound to be trained on the same data as the target classifier. We use this
framework to obtain contrastive and causal sufficiency and necessity scores as
global explanations for black-box classifiers. On the task of face attribute
classification, we show how different attributes influence the classifier
output by providing both causal and contrastive feature attributions, and the
corresponding counterfactual images.
Related papers
- Classes Are Not Equal: An Empirical Study on Image Recognition Fairness [100.36114135663836]
We experimentally demonstrate that classes are not equal and the fairness issue is prevalent for image classification models across various datasets.
Our findings reveal that models tend to exhibit greater prediction biases for classes that are more challenging to recognize.
Data augmentation and representation learning algorithms improve overall performance by promoting fairness to some degree in image classification.
arXiv Detail & Related papers (2024-02-28T07:54:50Z) - Improving Fairness using Vision-Language Driven Image Augmentation [60.428157003498995]
Fairness is crucial when training a deep-learning discriminative model, especially in the facial domain.
Models tend to correlate specific characteristics (such as age and skin color) with unrelated attributes (downstream tasks)
This paper proposes a method to mitigate these correlations to improve fairness.
arXiv Detail & Related papers (2023-11-02T19:51:10Z) - Counterfactual Image Generation for adversarially robust and
interpretable Classifiers [1.3859669037499769]
We propose a unified framework leveraging image-to-image translation Generative Adrial Networks (GANs) to produce counterfactual samples.
This is achieved by combining the classifier and discriminator into a single model that attributes real images to their respective classes and flags generated images as "fake"
We show how the model exhibits improved robustness to adversarial attacks, and we show how the discriminator's "fakeness" value serves as an uncertainty measure of the predictions.
arXiv Detail & Related papers (2023-10-01T18:50:29Z) - CLIMAX: An exploration of Classifier-Based Contrastive Explanations [5.381004207943597]
We propose a novel post-hoc model XAI technique that provides contrastive explanations justifying the classification of a black box.
Our method, which we refer to as CLIMAX, is based on local classifiers.
We show that we achieve better consistency as compared to baselines such as LIME, BayLIME, and SLIME.
arXiv Detail & Related papers (2023-07-02T22:52:58Z) - Discriminative Class Tokens for Text-to-Image Diffusion Models [107.98436819341592]
We propose a non-invasive fine-tuning technique that capitalizes on the expressive potential of free-form text.
Our method is fast compared to prior fine-tuning methods and does not require a collection of in-class images.
We evaluate our method extensively, showing that the generated images are: (i) more accurate and of higher quality than standard diffusion models, (ii) can be used to augment training data in a low-resource setting, and (iii) reveal information about the data used to train the guiding classifier.
arXiv Detail & Related papers (2023-03-30T05:25:20Z) - Counterfactual Generation Under Confounding [24.503075567519048]
A machine learning model, under the influence of observed or unobserved confounders in the training data, can learn spurious correlations.
We propose a counterfactual generation method that learns to modify the value of any attribute in an image and generate new images given a set of observed attributes.
Our method is computationally efficient, simple to implement, and works well for any number of generative factors and confounding variables.
arXiv Detail & Related papers (2022-10-22T06:39:22Z) - Causal Transportability for Visual Recognition [70.13627281087325]
We show that standard classifiers fail because the association between images and labels is not transportable across settings.
We then show that the causal effect, which severs all sources of confounding, remains invariant across domains.
This motivates us to develop an algorithm to estimate the causal effect for image classification.
arXiv Detail & Related papers (2022-04-26T15:02:11Z) - Understanding invariance via feedforward inversion of discriminatively
trained classifiers [30.23199531528357]
Past research has discovered that some extraneous visual detail remains in the output logits.
We develop a feedforward inversion model that produces remarkably high fidelity reconstructions.
Our approach is based on BigGAN, with conditioning on logits instead of one-hot class labels.
arXiv Detail & Related papers (2021-03-15T17:56:06Z) - Counterfactual Generative Networks [59.080843365828756]
We propose to decompose the image generation process into independent causal mechanisms that we train without direct supervision.
By exploiting appropriate inductive biases, these mechanisms disentangle object shape, object texture, and background.
We show that the counterfactual images can improve out-of-distribution with a marginal drop in performance on the original classification task.
arXiv Detail & Related papers (2021-01-15T10:23:12Z) - Learning and Evaluating Representations for Deep One-class
Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification.
We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations.
In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z) - Evaluating and Mitigating Bias in Image Classifiers: A Causal
Perspective Using Counterfactuals [27.539001365348906]
We present a method for generating counterfactuals by incorporating a structural causal model (SCM) in an improved variant of Adversarially Learned Inference (ALI)
We show how to explain a pre-trained machine learning classifier, evaluate its bias, and mitigate the bias using a counterfactual regularizer.
arXiv Detail & Related papers (2020-09-17T13:19:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.