Adversarial Attacks are Reversible with Natural Supervision
- URL: http://arxiv.org/abs/2103.14222v2
- Date: Mon, 29 Mar 2021 02:34:39 GMT
- Title: Adversarial Attacks are Reversible with Natural Supervision
- Authors: Chengzhi Mao, Mia Chiquier, Hao Wang, Junfeng Yang, Carl Vondrick
- Abstract summary: Images contain intrinsic structure that enables the reversal of many adversarial attacks.
We demonstrate that modifying the attacked image to restore the natural structure will reverse many types of attacks.
Our results suggest deep networks are vulnerable to adversarial examples partly because their representations do not enforce the natural structure of images.
- Score: 28.61536318614705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We find that images contain intrinsic structure that enables the reversal of
many adversarial attacks. Attack vectors cause not only image classifiers to
fail, but also collaterally disrupt incidental structure in the image. We
demonstrate that modifying the attacked image to restore the natural structure
will reverse many types of attacks, providing a defense. Experiments
demonstrate significantly improved robustness for several state-of-the-art
models across the CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets. Our results
show that our defense is still effective even if the attacker is aware of the
defense mechanism. Since our defense is deployed during inference instead of
training, it is compatible with pre-trained networks as well as most other
defenses. Our results suggest deep networks are vulnerable to adversarial
examples partly because their representations do not enforce the natural
structure of images.
Related papers
- Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked.
We propose an attack-agnostic defense method named Meta Invariance Defense (MID)
We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - The Best Defense is a Good Offense: Adversarial Augmentation against
Adversarial Attacks [91.56314751983133]
$A5$ is a framework to craft a defensive perturbation to guarantee that any attack towards the input in hand will fail.
We show effective on-the-fly defensive augmentation with a robustifier network that ignores the ground truth label.
We also show how to apply $A5$ to create certifiably robust physical objects.
arXiv Detail & Related papers (2023-05-23T16:07:58Z) - Increasing Confidence in Adversarial Robustness Evaluations [53.2174171468716]
We propose a test to identify weak attacks and thus weak defense evaluations.
Our test slightly modifies a neural network to guarantee the existence of an adversarial example for every sample.
For eleven out of thirteen previously-published defenses, the original evaluation of the defense fails our test, while stronger attacks that break these defenses pass it.
arXiv Detail & Related papers (2022-06-28T13:28:13Z) - Towards Robust Stacked Capsule Autoencoder with Hybrid Adversarial
Training [0.0]
Capsule networks (CapsNets) are new neural networks that classify images based on the spatial relationships of features.
The stacked capsule autoencoder (SCAE) is a state-of-the-art CapsNet, and achieved unsupervised classification of CapsNets for the first time.
We propose an evasion attack against SCAE, where the attacker can generate adversarial perturbations based on reducing the contribution of the object capsules.
We evaluate the defense method and the experimental results show that the refined SCAE model can achieve 82.14% classification accuracy under evasion attack.
arXiv Detail & Related papers (2022-02-28T13:17:21Z) - Preemptive Image Robustification for Protecting Users against
Man-in-the-Middle Adversarial Attacks [16.017328736786922]
A Man-in-the-Middle adversary maliciously intercepts and perturbs images web users upload online.
This type of attack can raise severe ethical concerns on top of simple performance degradation.
We devise a novel bi-level optimization algorithm that finds points in the vicinity of natural images that are robust to adversarial perturbations.
arXiv Detail & Related papers (2021-12-10T16:06:03Z) - Adversarial Purification through Representation Disentanglement [21.862799765511976]
Deep learning models are vulnerable to adversarial examples and make incomprehensible mistakes.
Current defense methods, especially purification, tend to remove noise" by learning and recovering the natural images.
In this work, we propose a novel adversarial purification scheme by presenting disentanglement of natural images and adversarial perturbations as a preprocessing defense.
arXiv Detail & Related papers (2021-10-15T01:45:31Z) - Online Alternate Generator against Adversarial Attacks [144.45529828523408]
Deep learning models are notoriously sensitive to adversarial examples which are synthesized by adding quasi-perceptible noises on real images.
We propose a portable defense method, online alternate generator, which does not need to access or modify the parameters of the target networks.
The proposed method works by online synthesizing another image from scratch for an input image, instead of removing or destroying adversarial noises.
arXiv Detail & Related papers (2020-09-17T07:11:16Z) - AdvFoolGen: Creating Persistent Troubles for Deep Classifiers [17.709146615433458]
We present a new black-box attack termed AdvFoolGen, which can generate attacking images from the same feature space as that of the natural images.
We demonstrate the effectiveness and robustness of our attack in the face of state-of-the-art defense techniques.
arXiv Detail & Related papers (2020-07-20T21:27:41Z) - GraCIAS: Grassmannian of Corrupted Images for Adversarial Security [4.259219671110274]
In this work, we propose a defense strategy that applies random image corruptions to the input image alone.
We develop proximity relationships between the projection operator of a clean image and of its adversarially perturbed version, via bounds relating geodesic distance on the Grassmannian to matrix Frobenius norms.
Unlike state-of-the-art approaches, even without any retraining, the proposed strategy achieves an absolute improvement of 4.5% in defense accuracy on ImageNet.
arXiv Detail & Related papers (2020-05-06T16:17:12Z) - Deflecting Adversarial Attacks [94.85315681223702]
We present a new approach towards ending this cycle where we "deflect" adversarial attacks by causing the attacker to produce an input that resembles the attack's target class.
We first propose a stronger defense based on Capsule Networks that combines three detection mechanisms to achieve state-of-the-art detection performance.
arXiv Detail & Related papers (2020-02-18T06:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.