On the Universal Adversarial Perturbations for Efficient Data-free
Adversarial Detection
- URL: http://arxiv.org/abs/2306.15705v1
- Date: Tue, 27 Jun 2023 02:54:07 GMT
- Title: On the Universal Adversarial Perturbations for Efficient Data-free
Adversarial Detection
- Authors: Songyang Gao, Shihan Dou, Qi Zhang, Xuanjing Huang, Jin Ma, Ying Shan
- Abstract summary: We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs.
Experimental results show that our method achieves competitive detection performance on various text classification tasks.
- Score: 55.73320979733527
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Detecting adversarial samples that are carefully crafted to fool the model is
a critical step to socially-secure applications. However, existing adversarial
detection methods require access to sufficient training data, which brings
noteworthy concerns regarding privacy leakage and generalizability. In this
work, we validate that the adversarial sample generated by attack algorithms is
strongly related to a specific vector in the high-dimensional inputs. Such
vectors, namely UAPs (Universal Adversarial Perturbations), can be calculated
without original training data. Based on this discovery, we propose a
data-agnostic adversarial detection framework, which induces different
responses between normal and adversarial samples to UAPs. Experimental results
show that our method achieves competitive detection performance on various text
classification tasks, and maintains an equivalent time consumption to normal
inference.
Related papers
- Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks.
It is quite beneficial and challenging to detect poisoned samples from a mixed dataset.
We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z) - Detecting Adversarial Data via Perturbation Forgery [28.637963515748456]
adversarial detection aims to identify and filter out adversarial data from the data flow based on discrepancies in distribution and noise patterns between natural and adversarial data.
New attacks based on generative models with imbalanced and anisotropic noise patterns evade detection.
We propose Perturbation Forgery, which includes noise distribution perturbation, sparse mask generation, and pseudo-adversarial data production, to train an adversarial detector capable of detecting unseen gradient-based, generative-model-based, and physical adversarial attacks.
arXiv Detail & Related papers (2024-05-25T13:34:16Z) - PASA: Attack Agnostic Unsupervised Adversarial Detection using Prediction & Attribution Sensitivity Analysis [2.5347892611213614]
Deep neural networks for classification are vulnerable to adversarial attacks, where small perturbations to input samples lead to incorrect predictions.
We develop a practical method for this characteristic of model prediction and feature attribution to detect adversarial samples.
Our approach demonstrates competitive performance even when an adversary is aware of the defense mechanism.
arXiv Detail & Related papers (2024-04-12T21:22:21Z) - Few-Shot Anomaly Detection with Adversarial Loss for Robust Feature
Representations [8.915958745269442]
Anomaly detection is a critical and challenging task that aims to identify data points deviating from normal patterns and distributions within a dataset.
Various methods have been proposed using a one-class-one-model approach, but these techniques often face practical problems such as memory inefficiency and the requirement of sufficient data for training.
We propose a few-shot anomaly detection method that integrates adversarial training loss to obtain more robust and generalized feature representations.
arXiv Detail & Related papers (2023-12-04T09:45:02Z) - How adversarial attacks can disrupt seemingly stable accurate classifiers [76.95145661711514]
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data.
Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data.
We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability.
arXiv Detail & Related papers (2023-09-07T12:02:00Z) - Learning Bias-Invariant Representation by Cross-Sample Mutual
Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task.
The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator.
We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z) - Detection of Adversarial Supports in Few-shot Classifiers Using Feature
Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets.
We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection.
Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z) - ATRO: Adversarial Training with a Rejection Option [10.36668157679368]
This paper proposes a classification framework with a rejection option to mitigate the performance deterioration caused by adversarial examples.
Applying the adversarial training objective to both a classifier and a rejection function simultaneously, we can choose to abstain from classification when it has insufficient confidence to classify a test data point.
arXiv Detail & Related papers (2020-10-24T14:05:03Z) - FADER: Fast Adversarial Example Rejection [19.305796826768425]
Recent defenses have been shown to improve adversarial robustness by detecting anomalous deviations from legitimate training samples at different layer representations.
We introduce FADER, a novel technique for speeding up detection-based methods.
Our experiments outline up to 73x prototypes reduction compared to analyzed detectors for MNIST dataset and up to 50x for CIFAR10 respectively.
arXiv Detail & Related papers (2020-10-18T22:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.