Related papers: ATRO: Adversarial Training with a Rejection Option

ATRO: Adversarial Training with a Rejection Option

URL: http://arxiv.org/abs/2010.12905v1
Date: Sat, 24 Oct 2020 14:05:03 GMT
Title: ATRO: Adversarial Training with a Rejection Option
Authors: Masahiro Kato, Zhenghang Cui, Yoshihiro Fukuhara
Abstract summary: This paper proposes a classification framework with a rejection option to mitigate the performance deterioration caused by adversarial examples. Applying the adversarial training objective to both a classifier and a rejection function simultaneously, we can choose to abstain from classification when it has insufficient confidence to classify a test data point.
Score: 10.36668157679368
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper proposes a classification framework with a rejection option to mitigate the performance deterioration caused by adversarial examples. While recent machine learning algorithms achieve high prediction performance, they are empirically vulnerable to adversarial examples, which are slightly perturbed data samples that are wrongly classified. In real-world applications, adversarial attacks using such adversarial examples could cause serious problems. To this end, various methods are proposed to obtain a classifier that is robust against adversarial examples. Adversarial training is one of them, which trains a classifier to minimize the worst-case loss under adversarial attacks. In this paper, in order to acquire a more reliable classifier against adversarial attacks, we propose the method of Adversarial Training with a Rejection Option (ATRO). Applying the adversarial training objective to both a classifier and a rejection function simultaneously, classifiers trained by ATRO can choose to abstain from classification when it has insufficient confidence to classify a test data point. We examine the feasibility of the framework using the surrogate maximum hinge loss and establish a generalization bound for linear models. Furthermore, we empirically confirmed the effectiveness of ATRO using various models and real-world datasets.

Related papers

Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
Advancing Adversarial Robustness Through Adversarial Logit Update [10.041289551532804]
Adversarial training and adversarial purification are among the most widely recognized defense strategies. We propose a new principle, namely Adversarial Logit Update (ALU), to infer adversarial sample's labels. Our solution achieves superior performance compared to state-of-the-art methods against a wide range of adversarial attacks.
arXiv Detail & Related papers (2023-08-29T07:13:31Z)
Stratified Adversarial Robustness with Rejection [33.72077702550626]
We study adversarially-robust classification with rejection in the stratified rejection setting. We propose a novel defense method -- Adversarial Training with Consistent Prediction-based Rejection (CPR) CPR significantly outperforms existing methods under strong adaptive attacks.
arXiv Detail & Related papers (2023-05-02T01:04:29Z)
Adversarial Attacks are a Surprisingly Strong Baseline for Poisoning Few-Shot Meta-Learners [28.468089304148453]
We attack amortized meta-learners, which allows us to craft colluding sets of inputs that fool the system's learning algorithm. We show that in a white box setting, these attacks are very successful and can cause the target model's predictions to become worse than chance. We explore two hypotheses to explain this: 'overfitting' by the attack, and mismatch between the model on which the attack is generated and that to which the attack is transferred.
arXiv Detail & Related papers (2022-11-23T14:55:44Z)
PARL: Enhancing Diversity of Ensemble Networks to Resist Adversarial Attacks via Pairwise Adversarially Robust Loss Function [13.417003144007156]
adversarial attacks tend to rely on the principle of transferability. Ensemble methods against adversarial attacks demonstrate that an adversarial example is less likely to mislead multiple classifiers. Recent ensemble methods have either been shown to be vulnerable to stronger adversaries or shown to lack an end-to-end evaluation.
arXiv Detail & Related papers (2021-12-09T14:26:13Z)
Towards A Conceptually Simple Defensive Approach for Few-shot classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks. We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering. Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z)
Detection of Adversarial Supports in Few-shot Classifiers Using Feature Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets. We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection. Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z)
Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle. In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize. Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z)
Adversarial Example Games [51.92698856933169]
Adrial Example Games (AEG) is a framework that models the crafting of adversarial examples. AEG provides a new way to design adversarial examples by adversarially training a generator and aversa from a given hypothesis class. We demonstrate the efficacy of AEG on the MNIST and CIFAR-10 datasets.
arXiv Detail & Related papers (2020-07-01T19:47:23Z)
Class-Aware Domain Adaptation for Improving Adversarial Robustness [27.24720754239852]
adversarial training has been proposed to train networks by injecting adversarial examples into the training data. We propose a novel Class-Aware Domain Adaptation (CADA) method for adversarial defense without directly applying adversarial training.
arXiv Detail & Related papers (2020-05-10T03:45:19Z)
Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks. We present a unifying view of randomized smoothing over arbitrary functions. We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.