Salient Feature Extractor for Adversarial Defense on Deep Neural
Networks
- URL: http://arxiv.org/abs/2105.06807v1
- Date: Fri, 14 May 2021 12:56:06 GMT
- Title: Salient Feature Extractor for Adversarial Defense on Deep Neural
Networks
- Authors: Jinyin Chen, Ruoxi Chen, Haibin Zheng, Zhaoyan Ming, Wenrong Jiang and
Chen Cui
- Abstract summary: Motivated by the observation that adversarial examples are due to the non-robust feature learned from the original dataset by models, we propose the concepts of salient feature(SF) and trivial feature(TF)
We put forward a novel detection and defense method named salient feature extractor (SFE) to defend against adversarial attacks.
- Score: 2.993911699314388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent years have witnessed unprecedented success achieved by deep learning
models in the field of computer vision. However, their vulnerability towards
carefully crafted adversarial examples has also attracted the increasing
attention of researchers. Motivated by the observation that adversarial
examples are due to the non-robust feature learned from the original dataset by
models, we propose the concepts of salient feature(SF) and trivial feature(TF).
The former represents the class-related feature, while the latter is usually
adopted to mislead the model. We extract these two features with coupled
generative adversarial network model and put forward a novel detection and
defense method named salient feature extractor (SFE) to defend against
adversarial attacks. Concretely, detection is realized by separating and
comparing the difference between SF and TF of the input. At the same time,
correct labels are obtained by re-identifying SF to reach the purpose of
defense. Extensive experiments are carried out on MNIST, CIFAR-10, and ImageNet
datasets where SFE shows state-of-the-art results in effectiveness and
efficiency compared with baselines. Furthermore, we provide an interpretable
understanding of the defense and detection process.
Related papers
- MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models.
Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs.
Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z) - Investigating Human-Identifiable Features Hidden in Adversarial
Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets.
We identify human-identifiable features in adversarial perturbations.
Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z) - Avoid Adversarial Adaption in Federated Learning by Multi-Metric
Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources.
FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks.
We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously.
MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z) - Spatial-Frequency Discriminability for Revealing Adversarial Perturbations [53.279716307171604]
Vulnerability of deep neural networks to adversarial perturbations has been widely perceived in the computer vision community.
Current algorithms typically detect adversarial patterns through discriminative decomposition for natural and adversarial data.
We propose a discriminative detector relying on a spatial-frequency Krawtchouk decomposition.
arXiv Detail & Related papers (2023-05-18T10:18:59Z) - Robust Transferable Feature Extractors: Learning to Defend Pre-Trained
Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors.
We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z) - Exploring Robustness of Unsupervised Domain Adaptation in Semantic
Segmentation [74.05906222376608]
We propose adversarial self-supervision UDA (or ASSUDA) that maximizes the agreement between clean images and their adversarial examples by a contrastive loss in the output space.
This paper is rooted in two observations: (i) the robustness of UDA methods in semantic segmentation remains unexplored, which pose a security concern in this field; and (ii) although commonly used self-supervision (e.g., rotation and jigsaw) benefits image tasks such as classification and recognition, they fail to provide the critical supervision signals that could learn discriminative representation for segmentation tasks.
arXiv Detail & Related papers (2021-05-23T01:50:44Z) - Selective and Features based Adversarial Example Detection [12.443388374869745]
Security-sensitive applications that relay on Deep Neural Networks (DNNs) are vulnerable to small perturbations crafted to generate Adversarial Examples (AEs)
We propose a novel unsupervised detection mechanism that uses the selective prediction, processing model layers outputs, and knowledge transfer concepts in a multi-task learning setting.
Experimental results show that the proposed approach achieves comparable results to the state-of-the-art methods against tested attacks in white box scenario and better results in black and gray boxes scenarios.
arXiv Detail & Related papers (2021-03-09T11:06:15Z) - Mitigating the Impact of Adversarial Attacks in Very Deep Networks [10.555822166916705]
Deep Neural Network (DNN) models have vulnerabilities related to security concerns.
Data poisoning-enabled perturbation attacks are complex adversarial ones that inject false data into models.
We propose an attack-agnostic-based defense method for mitigating their influence.
arXiv Detail & Related papers (2020-12-08T21:25:44Z) - Detection Defense Against Adversarial Attacks with Saliency Map [7.736844355705379]
It is well established that neural networks are vulnerable to adversarial examples, which are almost imperceptible on human vision.
Existing defenses are trend to harden the robustness of models against adversarial attacks.
We propose a novel method combined with additional noises and utilize the inconsistency strategy to detect adversarial examples.
arXiv Detail & Related papers (2020-09-06T13:57:17Z) - Heat and Blur: An Effective and Fast Defense Against Adversarial
Examples [2.2843885788439797]
We propose a simple defense that combines feature visualization with input modification.
We use these heatmaps as a basis for our defense, in which the adversarial effects are corrupted by massive blurring.
We also provide a new evaluation metric that can capture the effects of both attacks and defenses more thoroughly and descriptively.
arXiv Detail & Related papers (2020-03-17T08:11:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.