Unifying Image Counterfactuals and Feature Attributions with Latent-Space Adversarial Attacks
- URL: http://arxiv.org/abs/2504.15479v1
- Date: Mon, 21 Apr 2025 23:09:30 GMT
- Title: Unifying Image Counterfactuals and Feature Attributions with Latent-Space Adversarial Attacks
- Authors: Jeremy Goldwasser, Giles Hooker,
- Abstract summary: We introduce a new, easy-to-implement framework for counterfactual images.<n>Our method resembles an adversarial attack on the representation of the image along a low-dimensional manifold.<n>We show how to accompany counterfactuals with feature attribution that quantify the changes between the original and counterfactual images.
- Score: 3.8642937395065124
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Counterfactuals are a popular framework for interpreting machine learning predictions. These what if explanations are notoriously challenging to create for computer vision models: standard gradient-based methods are prone to produce adversarial examples, in which imperceptible modifications to image pixels provoke large changes in predictions. We introduce a new, easy-to-implement framework for counterfactual images that can flexibly adapt to contemporary advances in generative modeling. Our method, Counterfactual Attacks, resembles an adversarial attack on the representation of the image along a low-dimensional manifold. In addition, given an auxiliary dataset of image descriptors, we show how to accompany counterfactuals with feature attribution that quantify the changes between the original and counterfactual images. These importance scores can be aggregated into global counterfactual explanations that highlight the overall features driving model predictions. While this unification is possible for any counterfactual method, it has particular computational efficiency for ours. We demonstrate the efficacy of our approach with the MNIST and CelebA datasets.
Related papers
- A Meaningful Perturbation Metric for Evaluating Explainability Methods [55.09730499143998]
We introduce a novel approach, which harnesses image generation models to perform targeted perturbation.<n> Specifically, we focus on inpainting only the high-relevance pixels of an input image to modify the model's predictions while preserving image fidelity.<n>This is in contrast to existing approaches, which often produce out-of-distribution modifications, leading to unreliable results.
arXiv Detail & Related papers (2025-04-09T11:46:41Z) - Diffusion Counterfactuals for Image Regressors [1.534667887016089]
We present two methods to create counterfactual explanations for image regression tasks using diffusion-based generative models.<n>Both produce realistic, semantic, and smooth counterfactuals on CelebA-HQ and a synthetic data set.<n>We find that for regression counterfactuals, changes in features depend on the region of the predicted value.
arXiv Detail & Related papers (2025-03-26T14:42:46Z) - Data Augmentation via Latent Diffusion for Saliency Prediction [67.88936624546076]
Saliency prediction models are constrained by the limited diversity and quantity of labeled data.
We propose a novel data augmentation method for deep saliency prediction that edits natural images while preserving the complexity and variability of real-world scenes.
arXiv Detail & Related papers (2024-09-11T14:36:24Z) - IRAD: Implicit Representation-driven Image Resampling against Adversarial Attacks [16.577595936609665]
We introduce a novel approach to counter adversarial attacks, namely, image resampling.
Image resampling transforms a discrete image into a new one, simulating the process of scene recapturing or rerendering as specified by a geometrical transformation.
We show that our method significantly enhances the adversarial robustness of diverse deep models against various attacks while maintaining high accuracy on clean images.
arXiv Detail & Related papers (2023-10-18T11:19:32Z) - Counterfactual Image Generation for adversarially robust and
interpretable Classifiers [1.3859669037499769]
We propose a unified framework leveraging image-to-image translation Generative Adrial Networks (GANs) to produce counterfactual samples.
This is achieved by combining the classifier and discriminator into a single model that attributes real images to their respective classes and flags generated images as "fake"
We show how the model exhibits improved robustness to adversarial attacks, and we show how the discriminator's "fakeness" value serves as an uncertainty measure of the predictions.
arXiv Detail & Related papers (2023-10-01T18:50:29Z) - Effective Data Augmentation With Diffusion Models [65.09758931804478]
We address the lack of diversity in data augmentation with image-to-image transformations parameterized by pre-trained text-to-image diffusion models.
Our method edits images to change their semantics using an off-the-shelf diffusion model, and generalizes to novel visual concepts from a few labelled examples.
We evaluate our approach on few-shot image classification tasks, and on a real-world weed recognition task, and observe an improvement in accuracy in tested domains.
arXiv Detail & Related papers (2023-02-07T20:42:28Z) - Designing Counterfactual Generators using Deep Model Inversion [31.1607056675927]
We develop a deep inversion approach to generate counterfactual explanations for a given query image.
We find that, in addition to producing visually meaningful explanations, the counterfactuals from DISC are effective at learning decision boundaries and are robust to unknown test-time corruptions.
arXiv Detail & Related papers (2021-09-29T08:40:50Z) - Explainers in the Wild: Making Surrogate Explainers Robust to
Distortions through Perception [77.34726150561087]
We propose a methodology to evaluate the effect of distortions in explanations by embedding perceptual distances.
We generate explanations for images in the Imagenet-C dataset and demonstrate how using a perceptual distances in the surrogate explainer creates more coherent explanations for the distorted and reference images.
arXiv Detail & Related papers (2021-02-22T12:38:53Z) - Unsupervised Discovery of Disentangled Manifolds in GANs [74.24771216154105]
Interpretable generation process is beneficial to various image editing applications.
We propose a framework to discover interpretable directions in the latent space given arbitrary pre-trained generative adversarial networks.
arXiv Detail & Related papers (2020-11-24T02:18:08Z) - Encoding Robustness to Image Style via Adversarial Feature Perturbations [72.81911076841408]
We adapt adversarial training by directly perturbing feature statistics, rather than image pixels, to produce robust models.
Our proposed method, Adversarial Batch Normalization (AdvBN), is a single network layer that generates worst-case feature perturbations during training.
arXiv Detail & Related papers (2020-09-18T17:52:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.