Domain Knowledge Alleviates Adversarial Attacks in Multi-Label
Classifiers
- URL: http://arxiv.org/abs/2006.03833v4
- Date: Wed, 29 Dec 2021 11:45:28 GMT
- Title: Domain Knowledge Alleviates Adversarial Attacks in Multi-Label
Classifiers
- Authors: Stefano Melacci, Gabriele Ciravegna, Angelo Sotgiu, Ambra Demontis,
Battista Biggio, Marco Gori, Fabio Roli
- Abstract summary: Adversarial attacks on machine learning-based classifiers, along with defense mechanisms, have been widely studied.
In this paper, we shift the attention to multi-label classification, where the availability of domain knowledge may offer a natural way to spot incoherent predictions.
We explore this intuition in a framework in which first-order logic knowledge is converted into constraints and injected into a semi-supervised learning problem.
- Score: 34.526394646264734
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Adversarial attacks on machine learning-based classifiers, along with defense
mechanisms, have been widely studied in the context of single-label
classification problems. In this paper, we shift the attention to multi-label
classification, where the availability of domain knowledge on the relationships
among the considered classes may offer a natural way to spot incoherent
predictions, i.e., predictions associated to adversarial examples lying outside
of the training data distribution. We explore this intuition in a framework in
which first-order logic knowledge is converted into constraints and injected
into a semi-supervised learning problem. Within this setting, the constrained
classifier learns to fulfill the domain knowledge over the marginal
distribution, and can naturally reject samples with incoherent predictions.
Even though our method does not exploit any knowledge of attacks during
training, our experimental analysis surprisingly unveils that domain-knowledge
constraints can help detect adversarial examples effectively, especially if
such constraints are not known to the attacker.
Related papers
- Learning Robust Classifiers with Self-Guided Spurious Correlation Mitigation [26.544938760265136]
Deep neural classifiers rely on spurious correlations between spurious attributes of inputs and targets to make predictions.
We propose a self-guided spurious correlation mitigation framework.
We show that training the classifier to distinguish different prediction behaviors reduces its reliance on spurious correlations without knowing them a priori.
arXiv Detail & Related papers (2024-05-06T17:12:21Z) - Adversarial Resilience in Sequential Prediction via Abstention [46.80218090768711]
We study the problem of sequential prediction in the setting with an adversary that is allowed to inject clean-label adversarial examples.
We propose a new model of sequential prediction that sits between the purely and fully adversarial settings.
arXiv Detail & Related papers (2023-06-22T17:44:22Z) - Robustly-reliable learners under poisoning attacks [38.55373038919402]
We show how to achieve strong robustness guarantees in the face of such attacks across multiple axes.
We provide robustly-reliable predictions, in which the predicted label is guaranteed to be correct so long as the adversary has not exceeded a given corruption budget.
Remarkably we provide a complete characterization of learnability in this setting, in particular, nearly-tight matching upper and lower bounds on the region that can be certified.
arXiv Detail & Related papers (2022-03-08T15:43:33Z) - Resolving label uncertainty with implicit posterior models [71.62113762278963]
We propose a method for jointly inferring labels across a collection of data samples.
By implicitly assuming the existence of a generative model for which a differentiable predictor is the posterior, we derive a training objective that allows learning under weak beliefs.
arXiv Detail & Related papers (2022-02-28T18:09:44Z) - Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label.
Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm.
Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z) - Attack Transferability Characterization for Adversarially Robust
Multi-label Classification [37.00606062677375]
This study focuses on non-targeted evasion attack against multi-label classifiers.
The goal of the threat is to cause miss-classification with respect to as many labels as possible.
We unveil how the transferability level of the attack determines the attackability of the classifier.
arXiv Detail & Related papers (2021-06-29T12:50:20Z) - Localized Uncertainty Attacks [9.36341602283533]
We present localized uncertainty attacks against deep learning models.
We create adversarial examples by perturbing only regions in the inputs where a classifier is uncertain.
Unlike $ell_p$ ball or functional attacks which perturb inputs indiscriminately, our targeted changes can be less perceptible.
arXiv Detail & Related papers (2021-06-17T03:07:22Z) - Learning and Certification under Instance-targeted Poisoning [49.55596073963654]
We study PAC learnability and certification under instance-targeted poisoning attacks.
We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable.
We empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets.
arXiv Detail & Related papers (2021-05-18T17:48:15Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z) - Towards Robust Fine-grained Recognition by Maximal Separation of
Discriminative Features [72.72840552588134]
We identify the proximity of the latent representations of different classes in fine-grained recognition networks as a key factor to the success of adversarial attacks.
We introduce an attention-based regularization mechanism that maximally separates the discriminative latent features of different classes.
arXiv Detail & Related papers (2020-06-10T18:34:45Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.