Adversarial alignment: Breaking the trade-off between the strength of an
attack and its relevance to human perception
- URL: http://arxiv.org/abs/2306.03229v1
- Date: Mon, 5 Jun 2023 20:26:17 GMT
- Title: Adversarial alignment: Breaking the trade-off between the strength of an
attack and its relevance to human perception
- Authors: Drew Linsley, Pinyuan Feng, Thibaut Boissin, Alekh Karkada Ashok,
Thomas Fel, Stephanie Olaiya, Thomas Serre
- Abstract summary: Adversarial attacks have long been considered the "Achilles' heel" of deep learning.
Here, we investigate how the robustness of DNNs to adversarial attacks has evolved as their accuracy on ImageNet has continued to improve.
- Score: 10.883174135300418
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks (DNNs) are known to have a fundamental sensitivity to
adversarial attacks, perturbations of the input that are imperceptible to
humans yet powerful enough to change the visual decision of a model.
Adversarial attacks have long been considered the "Achilles' heel" of deep
learning, which may eventually force a shift in modeling paradigms.
Nevertheless, the formidable capabilities of modern large-scale DNNs have
somewhat eclipsed these early concerns. Do adversarial attacks continue to pose
a threat to DNNs?
Here, we investigate how the robustness of DNNs to adversarial attacks has
evolved as their accuracy on ImageNet has continued to improve. We measure
adversarial robustness in two different ways: First, we measure the smallest
adversarial attack needed to cause a model to change its object categorization
decision. Second, we measure how aligned successful attacks are with the
features that humans find diagnostic for object recognition. We find that
adversarial attacks are inducing bigger and more easily detectable changes to
image pixels as DNNs grow better on ImageNet, but these attacks are also
becoming less aligned with features that humans find diagnostic for
recognition. To better understand the source of this trade-off, we turn to the
neural harmonizer, a DNN training routine that encourages models to leverage
the same features as humans to solve tasks. Harmonized DNNs achieve the best of
both worlds and experience attacks that are detectable and affect features that
humans find diagnostic for recognition, meaning that attacks on these models
are more likely to be rendered ineffective by inducing similar effects on human
perception. Our findings suggest that the sensitivity of DNNs to adversarial
attacks can be mitigated by DNN scale, data scale, and training routines that
align models with biological intelligence.
Related papers
- Relationship between Uncertainty in DNNs and Adversarial Attacks [0.0]
Deep Neural Networks (DNNs) have achieved state of the art results and even outperformed human accuracy in many challenging tasks.
DNNs are accompanied by uncertainty about their results, causing them to predict an outcome that is either incorrect or outside of a certain level of confidence.
arXiv Detail & Related papers (2024-09-20T05:38:38Z) - Investigating Human-Identifiable Features Hidden in Adversarial
Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets.
We identify human-identifiable features in adversarial perturbations.
Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z) - Not So Robust After All: Evaluating the Robustness of Deep Neural
Networks to Unseen Adversarial Attacks [5.024667090792856]
Deep neural networks (DNNs) have gained prominence in various applications, such as classification, recognition, and prediction.
A fundamental attribute of traditional DNNs is their vulnerability to modifications in input data, which has resulted in the investigation of adversarial attacks.
This study aims to challenge the efficacy and generalization of contemporary defense mechanisms against adversarial attacks.
arXiv Detail & Related papers (2023-08-12T05:21:34Z) - Fixed Inter-Neuron Covariability Induces Adversarial Robustness [26.878913741674058]
The vulnerability to adversarial perturbations is a major flaw of Deep Neural Networks (DNNs)
We have developed the Self-Consistent Activation layer, which comprises of neurons whose activations are consistent with each other, as they conform to a fixed, but learned, covariability pattern.
The models with a SCA layer achieved high accuracy, and exhibited significantly greater robustness than multi-layer perceptron models to state-of-the-art Auto-PGD adversarial attacks textitwithout being trained on adversarially perturbed data.
arXiv Detail & Related papers (2023-08-07T23:46:14Z) - Sneaky Spikes: Uncovering Stealthy Backdoor Attacks in Spiking Neural
Networks with Neuromorphic Data [15.084703823643311]
spiking neural networks (SNNs) offer enhanced energy efficiency and biologically plausible data processing capabilities.
This paper delves into backdoor attacks in SNNs using neuromorphic datasets and diverse triggers.
We present various attack strategies, achieving an attack success rate of up to 100% while maintaining a negligible impact on clean accuracy.
arXiv Detail & Related papers (2023-02-13T11:34:17Z) - Harmonizing the object recognition strategies of deep neural networks
with humans [10.495114898741205]
We show that state-of-the-art deep neural networks (DNNs) are becoming less aligned with humans as their accuracy improves.
Our work represents the first demonstration that the scaling laws that are guiding the design of DNNs today have also produced worse models of human vision.
arXiv Detail & Related papers (2022-11-08T20:03:49Z) - Neural Architecture Dilation for Adversarial Robustness [56.18555072877193]
A shortcoming of convolutional neural networks is that they are vulnerable to adversarial attacks.
This paper aims to improve the adversarial robustness of the backbone CNNs that have a satisfactory accuracy.
Under a minimal computational overhead, a dilation architecture is expected to be friendly with the standard performance of the backbone CNN.
arXiv Detail & Related papers (2021-08-16T03:58:00Z) - The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems.
In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z) - Towards Adversarial Patch Analysis and Certified Defense against Crowd
Counting [61.99564267735242]
Crowd counting has drawn much attention due to its importance in safety-critical surveillance systems.
Recent studies have demonstrated that deep neural network (DNN) methods are vulnerable to adversarial attacks.
We propose a robust attack strategy called Adversarial Patch Attack with Momentum to evaluate the robustness of crowd counting models.
arXiv Detail & Related papers (2021-04-22T05:10:55Z) - On the benefits of robust models in modulation recognition [53.391095789289736]
Deep Neural Networks (DNNs) using convolutional layers are state-of-the-art in many tasks in communications.
In other domains, like image classification, DNNs have been shown to be vulnerable to adversarial perturbations.
We propose a novel framework to test the robustness of current state-of-the-art models.
arXiv Detail & Related papers (2021-03-27T19:58:06Z) - Perceptual Adversarial Robustness: Defense Against Unseen Threat Models [58.47179090632039]
A key challenge in adversarial robustness is the lack of a precise mathematical characterization of human perception.
Under the neural perceptual threat model, we develop novel perceptual adversarial attacks and defenses.
Because the NPTM is very broad, we find that Perceptual Adrial Training (PAT) against a perceptual attack gives robustness against many other types of adversarial attacks.
arXiv Detail & Related papers (2020-06-22T22:40:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.