Related papers: Generating Semantic Adversarial Examples via Feature Manipulation

Generating Semantic Adversarial Examples via Feature Manipulation

URL: http://arxiv.org/abs/2001.02297v2
Date: Fri, 20 May 2022 11:58:46 GMT
Title: Generating Semantic Adversarial Examples via Feature Manipulation
Authors: Shuo Wang, Surya Nepal, Carsten Rudolph, Marthie Grobler, Shangyu Chen, Tianle Chen
Abstract summary: We propose a more practical adversarial attack by designing structured perturbation with semantic meanings. Our proposed technique manipulates the semantic attributes of images via the disentangled latent codes. We demonstrate the existence of a universal, image-agnostic semantic adversarial example.
Score: 23.48763375455514
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The vulnerability of deep neural networks to adversarial attacks has been widely demonstrated (e.g., adversarial example attacks). Traditional attacks perform unstructured pixel-wise perturbation to fool the classifier. An alternative approach is to have perturbations in the latent space. However, such perturbations are hard to control due to the lack of interpretability and disentanglement. In this paper, we propose a more practical adversarial attack by designing structured perturbation with semantic meanings. Our proposed technique manipulates the semantic attributes of images via the disentangled latent codes. The intuition behind our technique is that images in similar domains have some commonly shared but theme-independent semantic attributes, e.g. thickness of lines in handwritten digits, that can be bidirectionally mapped to disentangled latent codes. We generate adversarial perturbation by manipulating a single or a combination of these latent codes and propose two unsupervised semantic manipulation approaches: vector-based disentangled representation and feature map-based disentangled representation, in terms of the complexity of the latent codes and smoothness of the reconstructed images. We conduct extensive experimental evaluations on real-world image data to demonstrate the power of our attacks for black-box classifiers. We further demonstrate the existence of a universal, image-agnostic semantic adversarial example.

Related papers

Transcending Adversarial Perturbations: Manifold-Aided Adversarial Examples with Legitimate Semantics [10.058463432437659]
Deep neural networks were significantly vulnerable to adversarial examples manipulated by malicious tiny perturbations. In this paper, we propose a supervised semantic-transformation generative model to generate adversarial examples with real and legitimate semantics. Experiments on MNIST and industrial defect datasets showed that our adversarial examples not only exhibited better visual quality but also achieved superior attack transferability.
arXiv Detail & Related papers (2024-02-05T15:25:40Z)
Instruct2Attack: Language-Guided Semantic Adversarial Attacks [76.83548867066561]
Instruct2Attack (I2A) is a language-guided semantic attack that generates meaningful perturbations according to free-form language instructions. We make use of state-of-the-art latent diffusion models, where we adversarially guide the reverse diffusion process to search for an adversarial latent code conditioned on the input image and text instruction. We show that I2A can successfully break state-of-the-art deep neural networks even under strong adversarial defenses.
arXiv Detail & Related papers (2023-11-27T05:35:49Z)
IRAD: Implicit Representation-driven Image Resampling against Adversarial Attacks [16.577595936609665]
We introduce a novel approach to counter adversarial attacks, namely, image resampling. Image resampling transforms a discrete image into a new one, simulating the process of scene recapturing or rerendering as specified by a geometrical transformation. We show that our method significantly enhances the adversarial robustness of diverse deep models against various attacks while maintaining high accuracy on clean images.
arXiv Detail & Related papers (2023-10-18T11:19:32Z)
Uncertainty-based Detection of Adversarial Attacks in Semantic Segmentation [16.109860499330562]
We introduce an uncertainty-based approach for the detection of adversarial attacks in semantic segmentation. We demonstrate the ability of our approach to detect perturbed images across multiple types of adversarial attacks.
arXiv Detail & Related papers (2023-05-22T08:36:35Z)
Content-based Unrestricted Adversarial Attack [53.181920529225906]
We propose a novel unrestricted attack framework called Content-based Unrestricted Adversarial Attack. By leveraging a low-dimensional manifold that represents natural images, we map the images onto the manifold and optimize them along its adversarial direction.
arXiv Detail & Related papers (2023-05-18T02:57:43Z)
Attack to Fool and Explain Deep Networks [59.97135687719244]
We counter-argue by providing evidence of human-meaningful patterns in adversarial perturbations. Our major contribution is a novel pragmatic adversarial attack that is subsequently transformed into a tool to interpret the visual models.
arXiv Detail & Related papers (2021-06-20T03:07:36Z)
Towards Defending against Adversarial Examples via Attack-Invariant Features [147.85346057241605]
Deep neural networks (DNNs) are vulnerable to adversarial noise. adversarial robustness can be improved by exploiting adversarial examples. Models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples.
arXiv Detail & Related papers (2021-06-09T12:49:54Z)
Adversarial Examples Detection beyond Image Space [88.7651422751216]
We find that there exists compliance between perturbations and prediction confidence, which guides us to detect few-perturbation attacks from the aspect of prediction confidence. We propose a method beyond image space by a two-stream architecture, in which the image stream focuses on the pixel artifacts and the gradient stream copes with the confidence artifacts.
arXiv Detail & Related papers (2021-02-23T09:55:03Z)
Detecting Adversarial Examples by Input Transformations, Defense Perturbations, and Voting [71.57324258813674]
convolutional neural networks (CNNs) have proved to reach super-human performance in visual recognition tasks. CNNs can easily be fooled by adversarial examples, i.e., maliciously-crafted images that force the networks to predict an incorrect output. This paper extensively explores the detection of adversarial examples via image transformations and proposes a novel methodology.
arXiv Detail & Related papers (2021-01-27T14:50:41Z)
Adversarial Defense by Latent Style Transformations [20.78877614953599]
We investigate an attack-agnostic defense against adversarial attacks on high-resolution images by detecting suspicious inputs. The intuition behind our approach is that the essential characteristics of a normal image are generally consistent with non-essential style transformations.
arXiv Detail & Related papers (2020-06-17T07:56:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.