Generating Out of Distribution Adversarial Attack using Latent Space
Poisoning
- URL: http://arxiv.org/abs/2012.05027v1
- Date: Wed, 9 Dec 2020 13:05:44 GMT
- Title: Generating Out of Distribution Adversarial Attack using Latent Space
Poisoning
- Authors: Ujjwal Upadhyay and Prerana Mukherjee
- Abstract summary: We propose a novel mechanism of generating adversarial examples where the actual image is not corrupted.
latent space representation is utilized to tamper with the inherent structure of the image.
As opposed to gradient-based attacks, the latent space poisoning exploits the inclination of classifiers to model the independent and identical distribution of the training dataset.
- Score: 5.1314136039587925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional adversarial attacks rely upon the perturbations generated by
gradients from the network which are generally safeguarded by gradient guided
search to provide an adversarial counterpart to the network. In this paper, we
propose a novel mechanism of generating adversarial examples where the actual
image is not corrupted rather its latent space representation is utilized to
tamper with the inherent structure of the image while maintaining the
perceptual quality intact and to act as legitimate data samples. As opposed to
gradient-based attacks, the latent space poisoning exploits the inclination of
classifiers to model the independent and identical distribution of the training
dataset and tricks it by producing out of distribution samples. We train a
disentangled variational autoencoder (beta-VAE) to model the data in latent
space and then we add noise perturbations using a class-conditioned
distribution function to the latent space under the constraint that it is
misclassified to the target label. Our empirical results on MNIST, SVHN, and
CelebA dataset validate that the generated adversarial examples can easily fool
robust l_0, l_2, l_inf norm classifiers designed using provably robust defense
mechanisms.
Related papers
- Certified $\ell_2$ Attribution Robustness via Uniformly Smoothed Attributions [20.487079380753876]
We propose a uniform smoothing technique that augments the vanilla attributions by noises uniformly sampled from a certain space.
It is proved that, for all perturbations within the attack region, the cosine similarity between uniformly smoothed attribution of perturbed sample and the unperturbed sample is guaranteed to be lower bounded.
arXiv Detail & Related papers (2024-05-10T09:56:02Z) - Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models! [52.0855711767075]
EvoSeed is an evolutionary strategy-based algorithmic framework for generating photo-realistic natural adversarial samples.
We employ CMA-ES to optimize the search for an initial seed vector, which, when processed by the Conditional Diffusion Model, results in the natural adversarial sample misclassified by the Model.
Experiments show that generated adversarial images are of high image quality, raising concerns about generating harmful content bypassing safety classifiers.
arXiv Detail & Related papers (2024-02-07T09:39:29Z) - A Privacy-Preserving Walk in the Latent Space of Generative Models for
Medical Applications [11.39717289910264]
Generative Adversarial Networks (GANs) have demonstrated their ability to generate synthetic samples that match a target distribution.
GANs tend to embed near-duplicates of real samples in the latent space.
We propose a latent space navigation strategy able to generate diverse synthetic samples that may support effective training of deep models.
arXiv Detail & Related papers (2023-07-06T13:35:48Z) - Diffusion-Based Adversarial Sample Generation for Improved Stealthiness
and Controllability [62.105715985563656]
We propose a novel framework dubbed Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic adversarial samples.
Our framework can be easily customized for specific tasks such as digital attacks, physical-world attacks, and style-based attacks.
arXiv Detail & Related papers (2023-05-25T21:51:23Z) - Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold.
We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples.
We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z) - Self-Conditioned Generative Adversarial Networks for Image Editing [61.50205580051405]
Generative Adversarial Networks (GANs) are susceptible to bias, learned from either the unbalanced data, or through mode collapse.
We argue that this bias is responsible not only for fairness concerns, but that it plays a key role in the collapse of latent-traversal editing methods when deviating away from the distribution's core.
arXiv Detail & Related papers (2022-02-08T18:08:24Z) - Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation.
ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations.
The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z) - DAAIN: Detection of Anomalous and Adversarial Input using Normalizing
Flows [52.31831255787147]
We introduce a novel technique, DAAIN, to detect out-of-distribution (OOD) inputs and adversarial attacks (AA)
Our approach monitors the inner workings of a neural network and learns a density estimator of the activation distribution.
Our model can be trained on a single GPU making it compute efficient and deployable without requiring specialized accelerators.
arXiv Detail & Related papers (2021-05-30T22:07:13Z) - Regularization with Latent Space Virtual Adversarial Training [4.874780144224057]
Virtual Adversarial Training (VAT) has shown impressive results among recently developed regularization methods.
We propose LVAT, which injects perturbation in the latent space instead of the input space.
LVAT can generate adversarial samples flexibly, resulting in more adverse effects and thus more effective regularization.
arXiv Detail & Related papers (2020-11-26T08:51:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.