Less is More: Feature Selection for Adversarial Robustness with
Compressive Counter-Adversarial Attacks
- URL: http://arxiv.org/abs/2106.10252v1
- Date: Fri, 18 Jun 2021 17:39:05 GMT
- Title: Less is More: Feature Selection for Adversarial Robustness with
Compressive Counter-Adversarial Attacks
- Authors: Emre Ozfatura and Muhammad Zaid Hameed and Kerem Ozfatura and Deniz
Gunduz
- Abstract summary: We propose a novel approach to identify the important features by employing counter-adrial attacks.
We show that there exist a subset of features, classification based on which bridge the gap between the clean and robust accuracy.
We then select features by observing the consistency of the activation values at the penultimate layer.
- Score: 7.5320132424481505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A common observation regarding adversarial attacks is that they mostly give
rise to false activation at the penultimate layer to fool the classifier.
Assuming that these activation values correspond to certain features of the
input, the objective becomes choosing the features that are most useful for
classification. Hence, we propose a novel approach to identify the important
features by employing counter-adversarial attacks, which highlights the
consistency at the penultimate layer with respect to perturbations on input
samples. First, we empirically show that there exist a subset of features,
classification based in which bridge the gap between the clean and robust
accuracy. Second, we propose a simple yet efficient mechanism to identify those
features by searching the neighborhood of input sample. We then select features
by observing the consistency of the activation values at the penultimate layer.
Related papers
- Not All Diffusion Model Activations Have Been Evaluated as Discriminative Features [115.33889811527533]
Diffusion models are initially designed for image generation.
Recent research shows that the internal signals within their backbones, named activations, can also serve as dense features for various discriminative tasks.
arXiv Detail & Related papers (2024-10-04T16:05:14Z) - Comprehensive Attribution: Inherently Explainable Vision Model with Feature Detector [30.23453108681447]
Inherently explainable attribution method aims to enhance the understanding of model behavior.
It is achieved by cooperatively training a selector (generating an attribution map to identify important features) and a predictor.
We introduce a new objective that discourages the presence of discriminative features in the masked-out regions.
Our model makes accurate predictions with higher accuracy than the regular black-box model.
arXiv Detail & Related papers (2024-07-27T17:45:20Z) - Greedy feature selection: Classifier-dependent feature selection via
greedy methods [2.4374097382908477]
The purpose of this study is to introduce a new approach to feature ranking for classification tasks, called in what follows greedy feature selection.
The benefits of such scheme are investigated theoretically in terms of model capacity indicators, such as the Vapnik-Chervonenkis (VC) dimension or the kernel alignment.
arXiv Detail & Related papers (2024-03-08T08:12:05Z) - Mitigating Feature Gap for Adversarial Robustness by Feature
Disentanglement [61.048842737581865]
Adversarial fine-tuning methods aim to enhance adversarial robustness through fine-tuning the naturally pre-trained model in an adversarial training manner.
We propose a disentanglement-based approach to explicitly model and remove the latent features that cause the feature gap.
Empirical evaluations on three benchmark datasets demonstrate that our approach surpasses existing adversarial fine-tuning methods and adversarial training baselines.
arXiv Detail & Related papers (2024-01-26T08:38:57Z) - Learning Classifiers of Prototypes and Reciprocal Points for Universal
Domain Adaptation [79.62038105814658]
Universal Domain aims to transfer the knowledge between datasets by handling two shifts: domain-shift and categoryshift.
Main challenge is correctly distinguishing the unknown target samples while adapting the distribution of known class knowledge from source to target.
Most existing methods approach this problem by first training the target adapted known and then relying on the single threshold to distinguish unknown target samples.
arXiv Detail & Related papers (2022-12-16T09:01:57Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Detection of Adversarial Supports in Few-shot Classifiers Using Feature
Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets.
We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection.
Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z) - Reachable Sets of Classifiers and Regression Models: (Non-)Robustness
Analysis and Robust Training [1.0878040851638]
We analyze and enhance robustness properties of both classifiers and regression models.
Specifically, we verify (non-)robustness, propose a robust training procedure, and show that our approach outperforms adversarial attacks.
Second, we provide techniques to distinguish between reliable and non-reliable predictions for unlabeled inputs, to quantify the influence of each feature on a prediction, and compute a feature ranking.
arXiv Detail & Related papers (2020-07-28T10:58:06Z) - Differentiable Unsupervised Feature Selection based on a Gated Laplacian [7.970954821067042]
We propose a differentiable loss function that combines the Laplacian score, which favors low-frequency features, with a gating mechanism for feature selection.
We mathematically motivate the proposed approach and demonstrate that in the high noise regime, it is crucial to compute the Laplacian on the gated inputs, rather than on the full feature set.
arXiv Detail & Related papers (2020-07-09T11:58:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.