Towards Robustness against Unsuspicious Adversarial Examples
- URL: http://arxiv.org/abs/2005.04272v2
- Date: Thu, 8 Oct 2020 16:58:09 GMT
- Title: Towards Robustness against Unsuspicious Adversarial Examples
- Authors: Liang Tong, Minzhe Guo, Atul Prakash, Yevgeniy Vorobeychik
- Abstract summary: We propose an approach for modeling suspiciousness by leveraging cognitive salience.
We compute the resulting non-salience-preserving dual-perturbation attacks on classifiers.
We show that adversarial training with dual-perturbation attacks yields classifiers that are more robust to these than state-of-the-art robust learning approaches.
- Score: 33.63338857434094
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the remarkable success of deep neural networks, significant concerns
have emerged about their robustness to adversarial perturbations to inputs.
While most attacks aim to ensure that these are imperceptible, physical
perturbation attacks typically aim for being unsuspicious, even if perceptible.
However, there is no universal notion of what it means for adversarial examples
to be unsuspicious. We propose an approach for modeling suspiciousness by
leveraging cognitive salience. Specifically, we split an image into foreground
(salient region) and background (the rest), and allow significantly larger
adversarial perturbations in the background, while ensuring that cognitive
salience of background remains low. We describe how to compute the resulting
non-salience-preserving dual-perturbation attacks on classifiers. We then
experimentally demonstrate that our attacks indeed do not significantly change
perceptual salience of the background, but are highly effective against
classifiers robust to conventional attacks. Furthermore, we show that
adversarial training with dual-perturbation attacks yields classifiers that are
more robust to these than state-of-the-art robust learning approaches, and
comparable in terms of robustness to conventional attacks.
Related papers
- Towards Adversarial Robustness via Debiased High-Confidence Logit Alignment [24.577363665112706]
Recent adversarial training techniques have utilized inverse adversarial attacks to generate high-confidence examples.
Our investigation reveals that high-confidence outputs under inverse adversarial attacks are correlated with biased feature activation.
We propose Debiased High-Confidence Adversarial Training (DHAT) to address this bias.
DHAT achieves state-of-the-art performance and exhibits robust generalization capabilities across various vision datasets.
arXiv Detail & Related papers (2024-08-12T11:56:06Z) - Improving Adversarial Robustness with Self-Paced Hard-Class Pair
Reweighting [5.084323778393556]
adversarial training with untargeted attacks is one of the most recognized methods.
We find that the naturally imbalanced inter-class semantic similarity makes those hard-class pairs to become the virtual targets of each other.
We propose to upweight hard-class pair loss in model optimization, which prompts learning discriminative features from hard classes.
arXiv Detail & Related papers (2022-10-26T22:51:36Z) - Rethinking Textual Adversarial Defense for Pre-trained Language Models [79.18455635071817]
A literature review shows that pre-trained language models (PrLMs) are vulnerable to adversarial attacks.
We propose a novel metric (Degree of Anomaly) to enable current adversarial attack approaches to generate more natural and imperceptible adversarial examples.
We show that our universal defense framework achieves comparable or even higher after-attack accuracy with other specific defenses.
arXiv Detail & Related papers (2022-07-21T07:51:45Z) - Adversarial Visual Robustness by Causal Intervention [56.766342028800445]
Adversarial training is the de facto most promising defense against adversarial examples.
Yet, its passive nature inevitably prevents it from being immune to unknown attackers.
We provide a causal viewpoint of adversarial vulnerability: the cause is the confounder ubiquitously existing in learning.
arXiv Detail & Related papers (2021-06-17T14:23:54Z) - Towards Defending against Adversarial Examples via Attack-Invariant
Features [147.85346057241605]
Deep neural networks (DNNs) are vulnerable to adversarial noise.
adversarial robustness can be improved by exploiting adversarial examples.
Models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples.
arXiv Detail & Related papers (2021-06-09T12:49:54Z) - Exploring Misclassifications of Robust Neural Networks to Enhance
Adversarial Attacks [3.3248768737711045]
We analyze the classification decisions of 19 different state-of-the-art neural networks trained to be robust against adversarial attacks.
We propose a novel loss function for adversarial attacks that consistently improves attack success rate.
arXiv Detail & Related papers (2021-05-21T12:10:38Z) - Adversarial Examples Detection beyond Image Space [88.7651422751216]
We find that there exists compliance between perturbations and prediction confidence, which guides us to detect few-perturbation attacks from the aspect of prediction confidence.
We propose a method beyond image space by a two-stream architecture, in which the image stream focuses on the pixel artifacts and the gradient stream copes with the confidence artifacts.
arXiv Detail & Related papers (2021-02-23T09:55:03Z) - Robust Tracking against Adversarial Attacks [69.59717023941126]
We first attempt to generate adversarial examples on top of video sequences to improve the tracking robustness against adversarial attacks.
We apply the proposed adversarial attack and defense approaches to state-of-the-art deep tracking algorithms.
arXiv Detail & Related papers (2020-07-20T08:05:55Z) - Temporal Sparse Adversarial Attack on Sequence-based Gait Recognition [56.844587127848854]
We demonstrate that the state-of-the-art gait recognition model is vulnerable to such attacks.
We employ a generative adversarial network based architecture to semantically generate adversarial high-quality gait silhouettes or video frames.
The experimental results show that if only one-fortieth of the frames are attacked, the accuracy of the target model drops dramatically.
arXiv Detail & Related papers (2020-02-22T10:08:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.