Constructing Semantics-Aware Adversarial Examples with Probabilistic
Perspective
- URL: http://arxiv.org/abs/2306.00353v2
- Date: Sun, 11 Feb 2024 14:36:00 GMT
- Title: Constructing Semantics-Aware Adversarial Examples with Probabilistic
Perspective
- Authors: Andi Zhang, Mingtian Zhang, Damon Wischik
- Abstract summary: We present a method for creating semantics-aware adversarial examples.
Our method produces adversarial perturbations that maintain the original image's semantics.
It offers users the flexibility to inject their own understanding of semantics into the adversarial examples.
- Score: 4.685487217906502
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a probabilistic perspective on adversarial examples. This
perspective allows us to view geometric restrictions on adversarial examples as
distributions, enabling a seamless shift towards data-driven, semantic
constraints. Building on this foundation, we present a method for creating
semantics-aware adversarial examples in a principle way. Leveraging the
advanced generalization capabilities of contemporary probabilistic generative
models, our method produces adversarial perturbations that maintain the
original image's semantics. Moreover, it offers users the flexibility to inject
their own understanding of semantics into the adversarial examples. Our
empirical findings indicate that the proposed methods achieve enhanced
transferability and higher success rates in circumventing adversarial defense
mechanisms, while maintaining a low detection rate by human observers.
Related papers
- Enhancing Adversarial Robustness via Uncertainty-Aware Distributional Adversarial Training [43.766504246864045]
We propose a novel uncertainty-aware distributional adversarial training method.
Our approach achieves state-of-the-art adversarial robustness and maintains natural performance.
arXiv Detail & Related papers (2024-11-05T07:26:24Z) - Transcending Adversarial Perturbations: Manifold-Aided Adversarial
Examples with Legitimate Semantics [10.058463432437659]
Deep neural networks were significantly vulnerable to adversarial examples manipulated by malicious tiny perturbations.
In this paper, we propose a supervised semantic-transformation generative model to generate adversarial examples with real and legitimate semantics.
Experiments on MNIST and industrial defect datasets showed that our adversarial examples not only exhibited better visual quality but also achieved superior attack transferability.
arXiv Detail & Related papers (2024-02-05T15:25:40Z) - Mitigating Feature Gap for Adversarial Robustness by Feature
Disentanglement [61.048842737581865]
Adversarial fine-tuning methods aim to enhance adversarial robustness through fine-tuning the naturally pre-trained model in an adversarial training manner.
We propose a disentanglement-based approach to explicitly model and remove the latent features that cause the feature gap.
Empirical evaluations on three benchmark datasets demonstrate that our approach surpasses existing adversarial fine-tuning methods and adversarial training baselines.
arXiv Detail & Related papers (2024-01-26T08:38:57Z) - Generating Less Certain Adversarial Examples Improves Robust Generalization [22.00283527210342]
This paper revisits the robust overfitting phenomenon of adversarial training.
We argue that overconfidence in predicting adversarial examples is a potential cause.
We propose a formal definition of adversarial certainty that captures the variance of the model's predicted logits on adversarial examples.
arXiv Detail & Related papers (2023-10-06T19:06:13Z) - Mist: Towards Improved Adversarial Examples for Diffusion Models [0.8883733362171035]
Diffusion Models (DMs) have empowered great success in artificial-intelligence-generated content, especially in artwork creation.
infringers can make profits by imitating non-authorized human-created paintings with DMs.
Recent researches suggest that various adversarial examples for diffusion models can be effective tools against these copyright infringements.
arXiv Detail & Related papers (2023-05-22T03:43:34Z) - The Enemy of My Enemy is My Friend: Exploring Inverse Adversaries for
Improving Adversarial Training [72.39526433794707]
Adversarial training and its variants have been shown to be the most effective approaches to defend against adversarial examples.
We propose a novel adversarial training scheme that encourages the model to produce similar outputs for an adversarial example and its inverse adversarial'' counterpart.
Our training method achieves state-of-the-art robustness as well as natural accuracy.
arXiv Detail & Related papers (2022-11-01T15:24:26Z) - Robust Transferable Feature Extractors: Learning to Defend Pre-Trained
Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors.
We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z) - Towards Defending against Adversarial Examples via Attack-Invariant
Features [147.85346057241605]
Deep neural networks (DNNs) are vulnerable to adversarial noise.
adversarial robustness can be improved by exploiting adversarial examples.
Models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples.
arXiv Detail & Related papers (2021-06-09T12:49:54Z) - Adversarial Examples Detection beyond Image Space [88.7651422751216]
We find that there exists compliance between perturbations and prediction confidence, which guides us to detect few-perturbation attacks from the aspect of prediction confidence.
We propose a method beyond image space by a two-stream architecture, in which the image stream focuses on the pixel artifacts and the gradient stream copes with the confidence artifacts.
arXiv Detail & Related papers (2021-02-23T09:55:03Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.