Metrics and methods for robustness evaluation of neural networks with
generative models
- URL: http://arxiv.org/abs/2003.01993v2
- Date: Sun, 15 Mar 2020 15:55:23 GMT
- Title: Metrics and methods for robustness evaluation of neural networks with
generative models
- Authors: Igor Buzhinsky, Arseny Nerinovsky, Stavros Tripakis
- Abstract summary: Recently, especially in computer vision, researchers discovered "natural" or "semantic" perturbations, such as rotations, changes of brightness, or more high-level changes.
We propose several metrics to measure robustness of classifiers to natural adversarial examples, and methods to evaluate them.
- Score: 0.07366405857677225
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies have shown that modern deep neural network classifiers are
easy to fool, assuming that an adversary is able to slightly modify their
inputs. Many papers have proposed adversarial attacks, defenses and methods to
measure robustness to such adversarial perturbations. However, most commonly
considered adversarial examples are based on $\ell_p$-bounded perturbations in
the input space of the neural network, which are unlikely to arise naturally.
Recently, especially in computer vision, researchers discovered "natural" or
"semantic" perturbations, such as rotations, changes of brightness, or more
high-level changes, but these perturbations have not yet been systematically
utilized to measure the performance of classifiers. In this paper, we propose
several metrics to measure robustness of classifiers to natural adversarial
examples, and methods to evaluate them. These metrics, called latent space
performance metrics, are based on the ability of generative models to capture
probability distributions, and are defined in their latent spaces. On three
image classification case studies, we evaluate the proposed metrics for several
classifiers, including ones trained in conventional and robust ways. We find
that the latent counterparts of adversarial robustness are associated with the
accuracy of the classifier rather than its conventional adversarial robustness,
but the latter is still reflected on the properties of found latent
perturbations. In addition, our novel method of finding latent adversarial
perturbations demonstrates that these perturbations are often perceptually
small.
Related papers
- Detecting Adversarial Attacks in Semantic Segmentation via Uncertainty Estimation: A Deep Analysis [12.133306321357999]
We propose an uncertainty-based method for detecting adversarial attacks on neural networks for semantic segmentation.
We conduct a detailed analysis of uncertainty-based detection of adversarial attacks and various state-of-the-art neural networks.
Our numerical experiments show the effectiveness of the proposed uncertainty-based detection method.
arXiv Detail & Related papers (2024-08-19T14:13:30Z) - Mitigating Feature Gap for Adversarial Robustness by Feature
Disentanglement [61.048842737581865]
Adversarial fine-tuning methods aim to enhance adversarial robustness through fine-tuning the naturally pre-trained model in an adversarial training manner.
We propose a disentanglement-based approach to explicitly model and remove the latent features that cause the feature gap.
Empirical evaluations on three benchmark datasets demonstrate that our approach surpasses existing adversarial fine-tuning methods and adversarial training baselines.
arXiv Detail & Related papers (2024-01-26T08:38:57Z) - How adversarial attacks can disrupt seemingly stable accurate classifiers [76.95145661711514]
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data.
Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data.
We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability.
arXiv Detail & Related papers (2023-09-07T12:02:00Z) - Residual Error: a New Performance Measure for Adversarial Robustness [85.0371352689919]
A major challenge that limits the wide-spread adoption of deep learning has been their fragility to adversarial attacks.
This study presents the concept of residual error, a new performance measure for assessing the adversarial robustness of a deep neural network.
Experimental results using the case of image classification demonstrate the effectiveness and efficacy of the proposed residual error metric.
arXiv Detail & Related papers (2021-06-18T16:34:23Z) - Localized Uncertainty Attacks [9.36341602283533]
We present localized uncertainty attacks against deep learning models.
We create adversarial examples by perturbing only regions in the inputs where a classifier is uncertain.
Unlike $ell_p$ ball or functional attacks which perturb inputs indiscriminately, our targeted changes can be less perceptible.
arXiv Detail & Related papers (2021-06-17T03:07:22Z) - Adversarial Perturbations Are Not So Weird: Entanglement of Robust and
Non-Robust Features in Neural Network Classifiers [4.511923587827301]
We show that in a neural network trained in a standard way, non-robust features respond to small, "non-semantic" patterns.
adversarial examples can be formed via minimal perturbations to these small, entangled patterns.
arXiv Detail & Related papers (2021-02-09T20:21:31Z) - Closeness and Uncertainty Aware Adversarial Examples Detection in
Adversarial Machine Learning [0.7734726150561088]
We explore and assess the usage of 2 different groups of metrics in detecting adversarial samples.
We introduce a new feature for adversarial detection, and we show that the performances of all these metrics heavily depend on the strength of the attack being used.
arXiv Detail & Related papers (2020-12-11T14:44:59Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z) - Attribute-Guided Adversarial Training for Robustness to Natural
Perturbations [64.35805267250682]
We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space.
Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations.
arXiv Detail & Related papers (2020-12-03T10:17:30Z) - Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial
Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification.
This paper studies a complementary failure mode, invariance-based adversarial examples.
We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.