Adversarial Examples and Metrics
- URL: http://arxiv.org/abs/2007.06993v2
- Date: Wed, 15 Jul 2020 11:50:21 GMT
- Title: Adversarial Examples and Metrics
- Authors: Nico D\"ottling, Kathrin Grosse, Michael Backes, Ian Molloy
- Abstract summary: Adversarial examples are a type of attack on machine learning (ML) systems which cause misclassification of inputs.
We study the limitations of robust classification if the target metric is uncertain.
- Score: 14.068394742881425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial examples are a type of attack on machine learning (ML) systems
which cause misclassification of inputs. Achieving robustness against
adversarial examples is crucial to apply ML in the real world. While most prior
work on adversarial examples is empirical, a recent line of work establishes
fundamental limitations of robust classification based on cryptographic
hardness. Most positive and negative results in this field however assume that
there is a fixed target metric which constrains the adversary, and we argue
that this is often an unrealistic assumption. In this work we study the
limitations of robust classification if the target metric is uncertain.
Concretely, we construct a classification problem, which admits robust
classification by a small classifier if the target metric is known at the time
the model is trained, but for which robust classification is impossible for
small classifiers if the target metric is chosen after the fact. In the
process, we explore a novel connection between hardness of robust
classification and bounded storage model cryptography.
Related papers
- Towards Class-wise Robustness Analysis [15.351461000403074]
Exploiting weakly robust classes is a potential avenue for attackers to fool the image recognition models.
This study investigates class-to-class biases across adversarially trained robust classification models.
We find that the number of false positives of classes as specific target classes significantly impacts their vulnerability to attacks.
arXiv Detail & Related papers (2024-11-29T17:09:59Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Towards Fair Classification against Poisoning Attacks [52.57443558122475]
We study the poisoning scenario where the attacker can insert a small fraction of samples into training data.
We propose a general and theoretically guaranteed framework which accommodates traditional defense methods to fair classification against poisoning attacks.
arXiv Detail & Related papers (2022-10-18T00:49:58Z) - Benign Overfitting in Adversarially Robust Linear Classification [91.42259226639837]
"Benign overfitting", where classifiers memorize noisy training data yet still achieve a good generalization performance, has drawn great attention in the machine learning community.
We show that benign overfitting indeed occurs in adversarial training, a principled approach to defend against adversarial examples.
arXiv Detail & Related papers (2021-12-31T00:27:31Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Localized Uncertainty Attacks [9.36341602283533]
We present localized uncertainty attacks against deep learning models.
We create adversarial examples by perturbing only regions in the inputs where a classifier is uncertain.
Unlike $ell_p$ ball or functional attacks which perturb inputs indiscriminately, our targeted changes can be less perceptible.
arXiv Detail & Related papers (2021-06-17T03:07:22Z) - Beyond cross-entropy: learning highly separable feature distributions
for robust and accurate classification [22.806324361016863]
We propose a novel approach for training deep robust multiclass classifiers that provides adversarial robustness.
We show that the regularization of the latent space based on our approach yields excellent classification accuracy.
arXiv Detail & Related papers (2020-10-29T11:15:17Z) - ATRO: Adversarial Training with a Rejection Option [10.36668157679368]
This paper proposes a classification framework with a rejection option to mitigate the performance deterioration caused by adversarial examples.
Applying the adversarial training objective to both a classifier and a rejection function simultaneously, we can choose to abstain from classification when it has insufficient confidence to classify a test data point.
arXiv Detail & Related papers (2020-10-24T14:05:03Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - Classifier uncertainty: evidence, potential impact, and probabilistic
treatment [0.0]
We present an approach to quantify the uncertainty of classification performance metrics based on a probability model of the confusion matrix.
We show that uncertainties can be surprisingly large and limit performance evaluation.
arXiv Detail & Related papers (2020-06-19T12:49:19Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.