A Bayes-Optimal View on Adversarial Examples
- URL: http://arxiv.org/abs/2002.08859v2
- Date: Wed, 17 Mar 2021 09:47:10 GMT
- Title: A Bayes-Optimal View on Adversarial Examples
- Authors: Eitan Richardson and Yair Weiss
- Abstract summary: We argue for examining adversarial examples from the perspective of Bayes-optimal classification.
Our results show that even when these "gold standard" optimal classifiers are robust, CNNs trained on the same datasets consistently learn a vulnerable classifier.
- Score: 9.51828574518325
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Since the discovery of adversarial examples - the ability to fool modern CNN
classifiers with tiny perturbations of the input, there has been much
discussion whether they are a "bug" that is specific to current neural
architectures and training methods or an inevitable "feature" of high
dimensional geometry. In this paper, we argue for examining adversarial
examples from the perspective of Bayes-Optimal classification. We construct
realistic image datasets for which the Bayes-Optimal classifier can be
efficiently computed and derive analytic conditions on the distributions under
which these classifiers are provably robust against any adversarial attack even
in high dimensions. Our results show that even when these "gold standard"
optimal classifiers are robust, CNNs trained on the same datasets consistently
learn a vulnerable classifier, indicating that adversarial examples are often
an avoidable "bug". We further show that RBF SVMs trained on the same data
consistently learn a robust classifier. The same trend is observed in
experiments with real images in different datasets.
Related papers
- MOREL: Enhancing Adversarial Robustness through Multi-Objective Representation Learning [1.534667887016089]
deep neural networks (DNNs) are vulnerable to slight adversarial perturbations.
We show that strong feature representation learning during training can significantly enhance the original model's robustness.
We propose MOREL, a multi-objective feature representation learning approach, encouraging classification models to produce similar features for inputs within the same class, despite perturbations.
arXiv Detail & Related papers (2024-10-02T16:05:03Z) - Robustness of Deep Neural Networks for Micro-Doppler Radar
Classification [1.3654846342364308]
Two deep convolutional architectures, trained and tested on the same data, are evaluated.
Models are susceptible to adversarial examples.
cadence-velocity diagram representation rather than Doppler-time are demonstrated to be naturally more immune to adversarial examples.
arXiv Detail & Related papers (2024-02-21T09:37:17Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Counterfactual Image Generation for adversarially robust and
interpretable Classifiers [1.3859669037499769]
We propose a unified framework leveraging image-to-image translation Generative Adrial Networks (GANs) to produce counterfactual samples.
This is achieved by combining the classifier and discriminator into a single model that attributes real images to their respective classes and flags generated images as "fake"
We show how the model exhibits improved robustness to adversarial attacks, and we show how the discriminator's "fakeness" value serves as an uncertainty measure of the predictions.
arXiv Detail & Related papers (2023-10-01T18:50:29Z) - Unrestricted Adversarial Samples Based on Non-semantic Feature Clusters
Substitution [1.8782750537161608]
We introduce "unrestricted" perturbations that create adversarial samples by using spurious relations learned by model training.
Specifically, we find feature clusters in non-semantic features that are strongly correlated with model judgment results.
We create adversarial samples by using them to replace the corresponding feature clusters in the target image.
arXiv Detail & Related papers (2022-08-31T07:42:36Z) - Smoothed Embeddings for Certified Few-Shot Learning [63.68667303948808]
We extend randomized smoothing to few-shot learning models that map inputs to normalized embeddings.
Our results are confirmed by experiments on different datasets.
arXiv Detail & Related papers (2022-02-02T18:19:04Z) - Efficient and Robust Classification for Sparse Attacks [34.48667992227529]
We consider perturbations bounded by the $ell$--norm, which have been shown as effective attacks in the domains of image-recognition, natural language processing, and malware-detection.
We propose a novel defense method that consists of "truncation" and "adrial training"
Motivated by the insights we obtain, we extend these components to neural network classifiers.
arXiv Detail & Related papers (2022-01-23T21:18:17Z) - Benign Overfitting in Adversarially Robust Linear Classification [91.42259226639837]
"Benign overfitting", where classifiers memorize noisy training data yet still achieve a good generalization performance, has drawn great attention in the machine learning community.
We show that benign overfitting indeed occurs in adversarial training, a principled approach to defend against adversarial examples.
arXiv Detail & Related papers (2021-12-31T00:27:31Z) - Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning.
We propose a novel method of using data augmentations when training autoencoders.
We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.