Transferable Universal Adversarial Perturbations Using Generative Models
- URL: http://arxiv.org/abs/2010.14919v2
- Date: Thu, 29 Oct 2020 15:19:41 GMT
- Title: Transferable Universal Adversarial Perturbations Using Generative Models
- Authors: Atiye Sadat Hashemi, Andreas B\"ar, Saeed Mozaffari, and Tim
Fingscheidt
- Abstract summary: Image-agnostic perturbations (UAPs) can fool deep neural networks with high confidence.
We propose a novel technique for generating more transferable UAPs.
We obtain an average fooling rate of 93.36% on the source models.
- Score: 29.52528162520099
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks tend to be vulnerable to adversarial perturbations,
which by adding to a natural image can fool a respective model with high
confidence. Recently, the existence of image-agnostic perturbations, also known
as universal adversarial perturbations (UAPs), were discovered. However,
existing UAPs still lack a sufficiently high fooling rate, when being applied
to an unknown target model. In this paper, we propose a novel deep learning
technique for generating more transferable UAPs. We utilize a perturbation
generator and some given pretrained networks so-called source models to
generate UAPs using the ImageNet dataset. Due to the similar feature
representation of various model architectures in the first layer, we propose a
loss formulation that focuses on the adversarial energy only in the respective
first layer of the source models. This supports the transferability of our
generated UAPs to any other target model. We further empirically analyze our
generated UAPs and demonstrate that these perturbations generalize very well
towards different target models. Surpassing the current state of the art in
both, fooling rate and model-transferability, we can show the superiority of
our proposed approach. Using our generated non-targeted UAPs, we obtain an
average fooling rate of 93.36% on the source models (state of the art: 82.16%).
Generating our UAPs on the deep ResNet-152, we obtain about a 12% absolute
fooling rate advantage vs. cutting-edge methods on VGG-16 and VGG-19 target
models.
Related papers
- Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - Texture Re-scalable Universal Adversarial Perturbation [61.33178492209849]
We propose texture scale-constrained UAP, which automatically generates UAPs with category-specific local textures.
TSC-UAP achieves a considerable improvement in the fooling ratio and attack transferability for both data-dependent and data-free UAP methods.
arXiv Detail & Related papers (2024-06-10T08:18:55Z) - Mixture of Low-rank Experts for Transferable AI-Generated Image Detection [18.631006488565664]
Generative models have shown a giant leap in photo-realistic images with minimal expertise, sparking concerns about the authenticity of online information.
This study aims to develop a universal AI-generated image detector capable of identifying images from diverse sources.
Inspired by the zero-shot transferability of pre-trained vision-language models, we seek to harness the non-trivial visual-world knowledge and descriptive proficiency of CLIP-ViT to generalize over unknown domains.
arXiv Detail & Related papers (2024-04-07T09:01:50Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent
Diffusion Model [61.53213964333474]
We propose a unified framework Adv-Diffusion that can generate imperceptible adversarial identity perturbations in the latent space but not the raw pixel space.
Specifically, we propose the identity-sensitive conditioned diffusion generative model to generate semantic perturbations in the surroundings.
The designed adaptive strength-based adversarial perturbation algorithm can ensure both attack transferability and stealthiness.
arXiv Detail & Related papers (2023-12-18T15:25:23Z) - Class-Prototype Conditional Diffusion Model with Gradient Projection for Continual Learning [20.175586324567025]
Mitigating catastrophic forgetting is a key hurdle in continual learning.
A major issue is the deterioration in the quality of generated data compared to the original.
We propose a GR-based approach for continual learning that enhances image quality in generators.
arXiv Detail & Related papers (2023-12-10T17:39:42Z) - Training Diffusion Models with Reinforcement Learning [82.29328477109826]
Diffusion models are trained with an approximation to the log-likelihood objective.
In this paper, we investigate reinforcement learning methods for directly optimizing diffusion models for downstream objectives.
We describe how posing denoising as a multi-step decision-making problem enables a class of policy gradient algorithms.
arXiv Detail & Related papers (2023-05-22T17:57:41Z) - Enhancing Targeted Attack Transferability via Diversified Weight Pruning [0.3222802562733786]
Malicious attackers can generate targeted adversarial examples by imposing human-imperceptible noise on images.
With cross-model transferable adversarial examples, the vulnerability of neural networks remains even if the model information is kept secret from the attacker.
Recent studies have shown the effectiveness of ensemble-based methods in generating transferable adversarial examples.
arXiv Detail & Related papers (2022-08-18T07:25:48Z) - Learning to Generate Image Source-Agnostic Universal Adversarial
Perturbations [65.66102345372758]
A universal adversarial perturbation (UAP) can simultaneously attack multiple images.
The existing UAP generator is underdeveloped when images are drawn from different image sources.
We take a novel view of UAP generation as a customized instance of few-shot learning.
arXiv Detail & Related papers (2020-09-29T01:23:20Z) - GAP++: Learning to generate target-conditioned adversarial examples [28.894143619182426]
Adversarial examples are perturbed inputs which can cause a serious threat for machine learning models.
We propose a more general-purpose framework which infers target-conditioned perturbations dependent on both input image and target label.
Our method achieves superior performance with single target attack models and obtains high fooling rates with small perturbation norms.
arXiv Detail & Related papers (2020-06-09T07:49:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.