Type I Attack for Generative Models
- URL: http://arxiv.org/abs/2003.01872v1
- Date: Wed, 4 Mar 2020 03:20:59 GMT
- Title: Type I Attack for Generative Models
- Authors: Chengjin Sun, Sizhe Chen, Jia Cai, Xiaolin Huang
- Abstract summary: We propose Type I attack to generative models such as VAE and GAN.
Our attack method is effective to generate Type I adversarial examples for generative models on large-scale image datasets.
- Score: 16.525823302000877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative models are popular tools with a wide range of applications.
Nevertheless, it is as vulnerable to adversarial samples as classifiers. The
existing attack methods mainly focus on generating adversarial examples by
adding imperceptible perturbations to input, which leads to wrong result.
However, we focus on another aspect of attack, i.e., cheating models by
significant changes. The former induces Type II error and the latter causes
Type I error. In this paper, we propose Type I attack to generative models such
as VAE and GAN. One example given in VAE is that we can change an original
image significantly to a meaningless one but their reconstruction results are
similar. To implement the Type I attack, we destroy the original one by
increasing the distance in input space while keeping the output similar because
different inputs may correspond to similar features for the property of deep
neural network. Experimental results show that our attack method is effective
to generate Type I adversarial examples for generative models on large-scale
image datasets.
Related papers
- Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection [83.72430401516674]
GAKer is able to construct adversarial examples to any target class.
Our method achieves an approximately $14.13%$ higher attack success rate for unknown classes.
arXiv Detail & Related papers (2024-07-17T03:24:09Z) - Can Adversarial Examples Be Parsed to Reveal Victim Model Information? [62.814751479749695]
In this work, we ask whether it is possible to infer data-agnostic victim model (VM) information from data-specific adversarial instances.
We collect a dataset of adversarial attacks across 7 attack types generated from 135 victim models.
We show that a simple, supervised model parsing network (MPN) is able to infer VM attributes from unseen adversarial attacks.
arXiv Detail & Related papers (2023-03-13T21:21:49Z) - A New Kind of Adversarial Example [47.64219291655723]
A large enough perturbation is added to an image such that a model maintains its original decision, whereas a human will most likely make a mistake if forced to decide.
Our proposed attack, dubbed NKE, is similar in essence to the fooling images, but is more efficient since it uses gradient descent instead of evolutionary algorithms.
arXiv Detail & Related papers (2022-08-04T03:45:44Z) - Practical No-box Adversarial Attacks against DNNs [31.808770437120536]
We investigate no-box adversarial examples, where the attacker can neither access the model information or the training set nor query the model.
We propose three mechanisms for training with a very small dataset and find that prototypical reconstruction is the most effective.
Our approach significantly diminishes the average prediction accuracy of the system to only 15.40%, which is on par with the attack that transfers adversarial examples from a pre-trained Arcface model.
arXiv Detail & Related papers (2020-12-04T11:10:03Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z) - Differentiable Language Model Adversarial Attacks on Categorical
Sequence Classifiers [0.0]
An adversarial attack paradigm explores various scenarios for the vulnerability of deep learning models.
We use a fine-tuning of a language model for adversarial attacks as a generator of adversarial examples.
Our model works for diverse datasets on bank transactions, electronic health records, and NLP datasets.
arXiv Detail & Related papers (2020-06-19T11:25:36Z) - Adversarial Imitation Attack [63.76805962712481]
A practical adversarial attack should require as little as possible knowledge of attacked models.
Current substitute attacks need pre-trained models to generate adversarial examples.
In this study, we propose a novel adversarial imitation attack.
arXiv Detail & Related papers (2020-03-28T10:02:49Z) - Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial
Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification.
This paper studies a complementary failure mode, invariance-based adversarial examples.
We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z) - AdvJND: Generating Adversarial Examples with Just Noticeable Difference [3.638233924421642]
Adding small perturbations on examples causes a good-performance model to misclassify the crafted examples.
Adversarial examples generated by our AdvJND algorithm yield distributions similar to those of the original inputs.
arXiv Detail & Related papers (2020-02-01T09:55:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.