Related papers: Adversarial examples by perturbing high-level features in intermediate decoder layers

Adversarial examples by perturbing high-level features in intermediate decoder layers

URL: http://arxiv.org/abs/2110.07182v1
Date: Thu, 14 Oct 2021 07:08:15 GMT
Title: Adversarial examples by perturbing high-level features in intermediate decoder layers
Authors: Vojt\v{e}ch \v{C}erm\'ak, Luk\'a\v{s} Adam
Abstract summary: Instead of perturbing pixels, we use an encoder-decoder representation of the input image and perturb intermediate layers in the decoder. Our perturbation possesses semantic meaning, such as a longer beak or green tints. We show that our method modifies key features such as edges and that defence techniques based on adversarial training are vulnerable to our attacks.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a novel method for creating adversarial examples. Instead of perturbing pixels, we use an encoder-decoder representation of the input image and perturb intermediate layers in the decoder. This changes the high-level features provided by the generative model. Therefore, our perturbation possesses semantic meaning, such as a longer beak or green tints. We formulate this task as an optimization problem by minimizing the Wasserstein distance between the adversarial and initial images under a misclassification constraint. We employ the projected gradient method with a simple inexact projection. Due to the projection, all iterations are feasible, and our method always generates adversarial images. We perform numerical experiments on the MNIST and ImageNet datasets in both targeted and untargeted settings. We demonstrate that our adversarial images are much less vulnerable to steganographic defence techniques than pixel-based attacks. Moreover, we show that our method modifies key features such as edges and that defence techniques based on adversarial training are vulnerable to our attacks.

Related papers

IRAD: Implicit Representation-driven Image Resampling against Adversarial Attacks [16.577595936609665]
We introduce a novel approach to counter adversarial attacks, namely, image resampling. Image resampling transforms a discrete image into a new one, simulating the process of scene recapturing or rerendering as specified by a geometrical transformation. We show that our method significantly enhances the adversarial robustness of diverse deep models against various attacks while maintaining high accuracy on clean images.
arXiv Detail & Related papers (2023-10-18T11:19:32Z)
SAIF: Sparse Adversarial and Imperceptible Attack Framework [7.025774823899217]
We propose a novel attack technique called Sparse Adversarial and Interpretable Attack Framework (SAIF) Specifically, we design imperceptible attacks that contain low-magnitude perturbations at a small number of pixels and leverage these sparse attacks to reveal the vulnerability of classifiers. SAIF computes highly imperceptible and interpretable adversarial examples, and outperforms state-of-the-art sparse attack methods on the ImageNet dataset.
arXiv Detail & Related papers (2022-12-14T20:28:50Z)
Scale-free Photo-realistic Adversarial Pattern Attack [20.818415741759512]
Generative Adversarial Networks (GAN) can partially address this problem by synthesizing a more semantically meaningful texture pattern. In this paper, we propose a scale-free generation-based attack algorithm that synthesizes semantically meaningful adversarial patterns globally to images with arbitrary scales.
arXiv Detail & Related papers (2022-08-12T11:25:39Z)
Adaptive Perturbation for Adversarial Attack [50.77612889697216]
We propose a new gradient-based attack method for adversarial examples. We use the exact gradient direction with a scaling factor for generating adversarial perturbations. Our method exhibits higher transferability and outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2021-11-27T07:57:41Z)
Transferable Sparse Adversarial Attack [62.134905824604104]
We introduce a generator architecture to alleviate the overfitting issue and thus efficiently craft transferable sparse adversarial examples. Our method achieves superior inference speed, 700$times$ faster than other optimization-based methods.
arXiv Detail & Related papers (2021-05-31T06:44:58Z)
Feature Space Targeted Attacks by Statistic Alignment [74.40447383387574]
Feature space targeted attacks perturb images by modulating their intermediate feature maps. The current choice of pixel-wise Euclidean Distance to measure the discrepancy is questionable because it unreasonably imposes a spatial-consistency constraint on the source and target features. We propose two novel approaches called Pair-wise Alignment Attack and Global-wise Alignment Attack, which attempt to measure similarities between feature maps by high-order statistics.
arXiv Detail & Related papers (2021-05-25T03:46:39Z)
Error Diffusion Halftoning Against Adversarial Examples [85.11649974840758]
Adversarial examples contain carefully crafted perturbations that can fool deep neural networks into making wrong predictions. We propose a new image transformation defense based on error diffusion halftoning, and combine it with adversarial training to defend against adversarial examples.
arXiv Detail & Related papers (2021-01-23T07:55:02Z)
Context-Aware Image Denoising with Auto-Threshold Canny Edge Detection to Suppress Adversarial Perturbation [0.8021197489470756]
This paper presents a novel context-aware image denoising algorithm. It combines an adaptive image smoothing technique and color reduction techniques to remove perturbation from adversarial images. Our results show that the proposed approach reduces adversarial perturbation in adversarial attacks and increases the robustness of the deep convolutional neural network models.
arXiv Detail & Related papers (2021-01-14T19:15:28Z)
Patch-wise Attack for Fooling Deep Neural Network [153.59832333877543]
We propose a patch-wise iterative algorithm -- a black-box attack towards mainstream normally trained and defense models. We significantly improve the success rate by 9.2% for defense models and 3.7% for normally trained models on average.
arXiv Detail & Related papers (2020-07-14T01:50:22Z)
Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields. To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss. We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.