Counterfactual Image Generation for adversarially robust and
interpretable Classifiers
- URL: http://arxiv.org/abs/2310.00761v1
- Date: Sun, 1 Oct 2023 18:50:29 GMT
- Title: Counterfactual Image Generation for adversarially robust and
interpretable Classifiers
- Authors: Rafael Bischof, Florian Scheidegger, Michael A. Kraus, A. Cristiano I.
Malossi
- Abstract summary: We propose a unified framework leveraging image-to-image translation Generative Adrial Networks (GANs) to produce counterfactual samples.
This is achieved by combining the classifier and discriminator into a single model that attributes real images to their respective classes and flags generated images as "fake"
We show how the model exhibits improved robustness to adversarial attacks, and we show how the discriminator's "fakeness" value serves as an uncertainty measure of the predictions.
- Score: 1.3859669037499769
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural Image Classifiers are effective but inherently hard to interpret and
susceptible to adversarial attacks. Solutions to both problems exist, among
others, in the form of counterfactual examples generation to enhance
explainability or adversarially augment training datasets for improved
robustness. However, existing methods exclusively address only one of the
issues. We propose a unified framework leveraging image-to-image translation
Generative Adversarial Networks (GANs) to produce counterfactual samples that
highlight salient regions for interpretability and act as adversarial samples
to augment the dataset for more robustness. This is achieved by combining the
classifier and discriminator into a single model that attributes real images to
their respective classes and flags generated images as "fake". We assess the
method's effectiveness by evaluating (i) the produced explainability masks on a
semantic segmentation task for concrete cracks and (ii) the model's resilience
against the Projected Gradient Descent (PGD) attack on a fruit defects
detection problem. Our produced saliency maps are highly descriptive, achieving
competitive IoU values compared to classical segmentation models despite being
trained exclusively on classification labels. Furthermore, the model exhibits
improved robustness to adversarial attacks, and we show how the discriminator's
"fakeness" value serves as an uncertainty measure of the predictions.
Related papers
- Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Imperceptible Face Forgery Attack via Adversarial Semantic Mask [59.23247545399068]
We propose an Adversarial Semantic Mask Attack framework (ASMA) which can generate adversarial examples with good transferability and invisibility.
Specifically, we propose a novel adversarial semantic mask generative model, which can constrain generated perturbations in local semantic regions for good stealthiness.
arXiv Detail & Related papers (2024-06-16T10:38:11Z) - MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent
Diffusion Model [61.53213964333474]
We propose a unified framework Adv-Diffusion that can generate imperceptible adversarial identity perturbations in the latent space but not the raw pixel space.
Specifically, we propose the identity-sensitive conditioned diffusion generative model to generate semantic perturbations in the surroundings.
The designed adaptive strength-based adversarial perturbation algorithm can ensure both attack transferability and stealthiness.
arXiv Detail & Related papers (2023-12-18T15:25:23Z) - Benchmark Generation Framework with Customizable Distortions for Image
Classifier Robustness [4.339574774938128]
We present a novel framework for generating adversarial benchmarks to evaluate the robustness of image classification models.
Our framework allows users to customize the types of distortions to be optimally applied to images, which helps address the specific distortions relevant to their deployment.
arXiv Detail & Related papers (2023-10-28T07:40:42Z) - On Evaluating the Adversarial Robustness of Semantic Segmentation Models [0.0]
A number of adversarial training approaches have been proposed as a defense against adversarial perturbation.
We show for the first time that a number of models in previous work that are claimed to be robust are in fact not robust at all.
We then evaluate simple adversarial training algorithms that produce reasonably robust models even under our set of strong attacks.
arXiv Detail & Related papers (2023-06-25T11:45:08Z) - Uncertainty-based Detection of Adversarial Attacks in Semantic
Segmentation [16.109860499330562]
We introduce an uncertainty-based approach for the detection of adversarial attacks in semantic segmentation.
We demonstrate the ability of our approach to detect perturbed images across multiple types of adversarial attacks.
arXiv Detail & Related papers (2023-05-22T08:36:35Z) - Semantic Image Attack for Visual Model Diagnosis [80.36063332820568]
In practice, metric analysis on a specific train and test dataset does not guarantee reliable or fair ML models.
This paper proposes Semantic Image Attack (SIA), a method based on the adversarial attack that provides semantic adversarial images.
arXiv Detail & Related papers (2023-03-23T03:13:04Z) - Contrastive Learning for Fair Representations [50.95604482330149]
Trained classification models can unintentionally lead to biased representations and predictions.
Existing debiasing methods for classification models, such as adversarial training, are often expensive to train and difficult to optimise.
We propose a method for mitigating bias by incorporating contrastive learning, in which instances sharing the same class label are encouraged to have similar representations.
arXiv Detail & Related papers (2021-09-22T10:47:51Z) - Evaluating and Mitigating Bias in Image Classifiers: A Causal
Perspective Using Counterfactuals [27.539001365348906]
We present a method for generating counterfactuals by incorporating a structural causal model (SCM) in an improved variant of Adversarially Learned Inference (ALI)
We show how to explain a pre-trained machine learning classifier, evaluate its bias, and mitigate the bias using a counterfactual regularizer.
arXiv Detail & Related papers (2020-09-17T13:19:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.