RAID: Randomized Adversarial-Input Detection for Neural Networks
- URL: http://arxiv.org/abs/2002.02776v1
- Date: Fri, 7 Feb 2020 13:27:29 GMT
- Title: RAID: Randomized Adversarial-Input Detection for Neural Networks
- Authors: Hasan Ferit Eniser, Maria Christakis, Valentin W\"ustholz
- Abstract summary: We propose a novel technique for adversarial-image detection, RAID, that trains a secondary classifier to identify differences in neuron activation values between benign and adversarial inputs.
RAID is more reliable and more effective than the state of the art when evaluated against six popular attacks.
- Score: 7.37305608518763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, neural networks have become the default choice for image
classification and many other learning tasks, even though they are vulnerable
to so-called adversarial attacks. To increase their robustness against these
attacks, there have emerged numerous detection mechanisms that aim to
automatically determine if an input is adversarial. However, state-of-the-art
detection mechanisms either rely on being tuned for each type of attack, or
they do not generalize across different attack types. To alleviate these
issues, we propose a novel technique for adversarial-image detection, RAID,
that trains a secondary classifier to identify differences in neuron activation
values between benign and adversarial inputs. Our technique is both more
reliable and more effective than the state of the art when evaluated against
six popular attacks. Moreover, a straightforward extension of RAID increases
its robustness against detection-aware adversaries without affecting its
effectiveness.
Related papers
- Detecting Adversarial Attacks in Semantic Segmentation via Uncertainty Estimation: A Deep Analysis [12.133306321357999]
We propose an uncertainty-based method for detecting adversarial attacks on neural networks for semantic segmentation.
We conduct a detailed analysis of uncertainty-based detection of adversarial attacks and various state-of-the-art neural networks.
Our numerical experiments show the effectiveness of the proposed uncertainty-based detection method.
arXiv Detail & Related papers (2024-08-19T14:13:30Z) - Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked.
We propose an attack-agnostic defense method named Meta Invariance Defense (MID)
We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z) - Investigating Human-Identifiable Features Hidden in Adversarial
Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets.
We identify human-identifiable features in adversarial perturbations.
Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z) - Spatial-Frequency Discriminability for Revealing Adversarial Perturbations [53.279716307171604]
Vulnerability of deep neural networks to adversarial perturbations has been widely perceived in the computer vision community.
Current algorithms typically detect adversarial patterns through discriminative decomposition for natural and adversarial data.
We propose a discriminative detector relying on a spatial-frequency Krawtchouk decomposition.
arXiv Detail & Related papers (2023-05-18T10:18:59Z) - Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks [76.35478518372692]
We introduce epsilon-illusory, a novel form of adversarial attack on sequential decision-makers.
Compared to existing attacks, we empirically find epsilon-illusory to be significantly harder to detect with automated methods.
Our findings suggest the need for better anomaly detectors, as well as effective hardware- and system-level defenses.
arXiv Detail & Related papers (2022-07-20T19:49:09Z) - Identification of Attack-Specific Signatures in Adversarial Examples [62.17639067715379]
We show that different attack algorithms produce adversarial examples which are distinct not only in their effectiveness but also in how they qualitatively affect their victims.
Our findings suggest that prospective adversarial attacks should be compared not only via their success rates at fooling models but also via deeper downstream effects they have on victims.
arXiv Detail & Related papers (2021-10-13T15:40:48Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - Adversarial Attacks and Mitigation for Anomaly Detectors of
Cyber-Physical Systems [6.417955560857806]
In this work, we present an adversarial attack that simultaneously evades the anomaly detectors and rule checkers of a CPS.
Inspired by existing gradient-based approaches, our adversarial attack crafts noise over the sensor and actuator values, then uses a genetic algorithm to optimise the latter.
We implement our approach for two real-world critical infrastructure testbeds, successfully reducing the classification accuracy of their detectors by over 50% on average.
arXiv Detail & Related papers (2021-05-22T12:19:03Z) - ExAD: An Ensemble Approach for Explanation-based Adversarial Detection [17.455233006559734]
We propose ExAD, a framework to detect adversarial examples using an ensemble of explanation techniques.
We evaluate our approach using six state-of-the-art adversarial attacks on three image datasets.
arXiv Detail & Related papers (2021-03-22T00:53:07Z) - Adversarial robustness via stochastic regularization of neural
activation sensitivity [24.02105949163359]
We suggest a novel defense mechanism that simultaneously addresses both defense goals.
We flatten the gradients of the loss surface, making adversarial examples harder to find.
In addition, we push the decision away from correctly classified inputs by leveraging Jacobian regularization.
arXiv Detail & Related papers (2020-09-23T19:31:55Z) - Adversarial Feature Desensitization [12.401175943131268]
We propose a novel approach to adversarial robustness, which builds upon the insights from the domain adaptation field.
Our method, called Adversarial Feature Desensitization (AFD), aims at learning features that are invariant towards adversarial perturbations of the inputs.
arXiv Detail & Related papers (2020-06-08T14:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.