Adaptive Clustering of Robust Semantic Representations for Adversarial
Image Purification
- URL: http://arxiv.org/abs/2104.02155v2
- Date: Wed, 7 Apr 2021 15:22:42 GMT
- Title: Adaptive Clustering of Robust Semantic Representations for Adversarial
Image Purification
- Authors: Samuel Henrique Silva, Arun Das, Ian Scarff, Peyman Najafirad
- Abstract summary: We propose a robust defense against adversarial attacks, which is model agnostic and generalizable to unseen adversaries.
In this paper, we extract the latent representations for each class and adaptively cluster the latent representations that share a semantic similarity.
We adversarially train a new model constraining the latent space representation to minimize the distance between the adversarial latent representation and the true cluster distribution.
- Score: 0.9203366434753543
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep Learning models are highly susceptible to adversarial manipulations that
can lead to catastrophic consequences. One of the most effective methods to
defend against such disturbances is adversarial training but at the cost of
generalization of unseen attacks and transferability across models. In this
paper, we propose a robust defense against adversarial attacks, which is model
agnostic and generalizable to unseen adversaries. Initially, with a baseline
model, we extract the latent representations for each class and adaptively
cluster the latent representations that share a semantic similarity. We obtain
the distributions for the clustered latent representations and from their
originating images, we learn semantic reconstruction dictionaries (SRD). We
adversarially train a new model constraining the latent space representation to
minimize the distance between the adversarial latent representation and the
true cluster distribution. To purify the image, we decompose the input into low
and high-frequency components. The high-frequency component is reconstructed
based on the most adequate SRD from the clean dataset. In order to evaluate the
most adequate SRD, we rely on the distance between robust latent
representations and semantic cluster distributions. The output is a purified
image with no perturbation. Image purification on CIFAR-10 and ImageNet-10
using our proposed method improved the accuracy by more than 10% compared to
state-of-the-art results.
Related papers
- Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - ZeroPur: Succinct Training-Free Adversarial Purification [52.963392510839284]
Adversarial purification is a kind of defense computation technique that can defend various unseen adversarial attacks.
We present a simple adversarial purification method without further training to purify adversarial images, called ZeroPur.
arXiv Detail & Related papers (2024-06-05T10:58:15Z) - Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent
Diffusion Model [61.53213964333474]
We propose a unified framework Adv-Diffusion that can generate imperceptible adversarial identity perturbations in the latent space but not the raw pixel space.
Specifically, we propose the identity-sensitive conditioned diffusion generative model to generate semantic perturbations in the surroundings.
The designed adaptive strength-based adversarial perturbation algorithm can ensure both attack transferability and stealthiness.
arXiv Detail & Related papers (2023-12-18T15:25:23Z) - Counterfactual Image Generation for adversarially robust and
interpretable Classifiers [1.3859669037499769]
We propose a unified framework leveraging image-to-image translation Generative Adrial Networks (GANs) to produce counterfactual samples.
This is achieved by combining the classifier and discriminator into a single model that attributes real images to their respective classes and flags generated images as "fake"
We show how the model exhibits improved robustness to adversarial attacks, and we show how the discriminator's "fakeness" value serves as an uncertainty measure of the predictions.
arXiv Detail & Related papers (2023-10-01T18:50:29Z) - Improving Adversarial Robustness of Masked Autoencoders via Test-time
Frequency-domain Prompting [133.55037976429088]
We investigate the adversarial robustness of vision transformers equipped with BERT pretraining (e.g., BEiT, MAE)
A surprising observation is that MAE has significantly worse adversarial robustness than other BERT pretraining methods.
We propose a simple yet effective way to boost the adversarial robustness of MAE.
arXiv Detail & Related papers (2023-08-20T16:27:17Z) - Carefully Blending Adversarial Training and Purification Improves Adversarial Robustness [1.2289361708127877]
CARSO is able to defend itself against adaptive end-to-end white-box attacks devised for defences.
Our method improves by a significant margin the state-of-the-art for CIFAR-10, CIFAR-100, and TinyImageNet-200.
arXiv Detail & Related papers (2023-05-25T09:04:31Z) - Diffusion Models for Adversarial Purification [69.1882221038846]
Adrial purification refers to a class of defense methods that remove adversarial perturbations using a generative model.
We propose DiffPure that uses diffusion models for adversarial purification.
Our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods.
arXiv Detail & Related papers (2022-05-16T06:03:00Z) - Optimal Transport as a Defense Against Adversarial Attacks [4.6193503399184275]
Adversarial attacks can find a human-imperceptible perturbation for a given image that will mislead a trained model.
Previous work aimed to align original and adversarial image representations in the same way as domain adaptation to improve robustness.
We propose to use a loss between distributions that faithfully reflect the ground distance.
This leads to SAT (Sinkhorn Adversarial Training), a more robust defense against adversarial attacks.
arXiv Detail & Related papers (2021-02-05T13:24:36Z) - Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model.
We propose to exploit additional information from the feature space to craft stronger adversaries.
Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z) - Robust Face Verification via Disentangled Representations [20.393894616979402]
We introduce a robust algorithm for face verification, deciding whether twoimages are of the same person or not.
We use the generativemodel during training as an online augmentation method instead of a test-timepurifier that removes adversarial noise.
We experimentally show that, when coupled with adversarial training, the proposed scheme converges with aweak inner solver and has a higher clean and robust accuracy than state-of-the-art-methods when evaluated against white-box physical attacks.
arXiv Detail & Related papers (2020-06-05T19:17:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.