Extending Adversarial Attacks to Produce Adversarial Class Probability
Distributions
- URL: http://arxiv.org/abs/2004.06383v2
- Date: Tue, 21 Sep 2021 23:25:22 GMT
- Title: Extending Adversarial Attacks to Produce Adversarial Class Probability
Distributions
- Authors: Jon Vadillo, Roberto Santana and Jose A. Lozano
- Abstract summary: We show that we can approximate any probability distribution for the classes while maintaining a high fooling rate.
Our results demonstrate that we can closely approximate any probability distribution for the classes while maintaining a high fooling rate.
- Score: 1.439518478021091
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite the remarkable performance and generalization levels of deep learning
models in a wide range of artificial intelligence tasks, it has been
demonstrated that these models can be easily fooled by the addition of
imperceptible yet malicious perturbations to natural inputs. These altered
inputs are known in the literature as adversarial examples. In this paper, we
propose a novel probabilistic framework to generalize and extend adversarial
attacks in order to produce a desired probability distribution for the classes
when we apply the attack method to a large number of inputs. This novel attack
strategy provides the attacker with greater control over the target model, and
increases the complexity of detecting that the model is being systematically
attacked. We introduce four different strategies to efficiently generate such
attacks, and illustrate our approach by extending multiple adversarial attack
algorithms. We also experimentally validate our approach for the spoken command
classification task, an exemplary machine learning problem in the audio domain.
Our results demonstrate that we can closely approximate any probability
distribution for the classes while maintaining a high fooling rate and by
injecting imperceptible perturbations to the inputs.
Related papers
- ExploreADV: Towards exploratory attack for Neural Networks [0.33302293148249124]
ExploreADV is a general and flexible adversarial attack system that is capable of modeling regional and imperceptible attacks.
We show that our system offers users good flexibility to focus on sub-regions of inputs, explore imperceptible perturbations and understand the vulnerability of pixels/regions to adversarial attacks.
arXiv Detail & Related papers (2023-01-01T07:17:03Z) - Adversarial Attacks are a Surprisingly Strong Baseline for Poisoning
Few-Shot Meta-Learners [28.468089304148453]
We attack amortized meta-learners, which allows us to craft colluding sets of inputs that fool the system's learning algorithm.
We show that in a white box setting, these attacks are very successful and can cause the target model's predictions to become worse than chance.
We explore two hypotheses to explain this: 'overfitting' by the attack, and mismatch between the model on which the attack is generated and that to which the attack is transferred.
arXiv Detail & Related papers (2022-11-23T14:55:44Z) - Universal Distributional Decision-based Black-box Adversarial Attack
with Reinforcement Learning [5.240772699480865]
We propose a pixel-wise decision-based attack algorithm that finds a distribution of adversarial perturbation through a reinforcement learning algorithm.
Experiments show that the proposed approach outperforms state-of-the-art decision-based attacks with a higher attack success rate and greater transferability.
arXiv Detail & Related papers (2022-11-15T18:30:18Z) - Towards Generating Adversarial Examples on Mixed-type Data [32.41305735919529]
We propose a novel attack algorithm M-Attack, which can effectively generate adversarial examples in mixed-type data.
Based on M-Attack, attackers can attempt to mislead the targeted classification model's prediction, by only slightly perturbing both the numerical and categorical features in the given data samples.
Our generated adversarial examples can evade potential detection models, which makes the attack indeed insidious.
arXiv Detail & Related papers (2022-10-17T20:17:21Z) - Adversarial Robustness of Deep Reinforcement Learning based Dynamic
Recommender Systems [50.758281304737444]
We propose to explore adversarial examples and attack detection on reinforcement learning-based interactive recommendation systems.
We first craft different types of adversarial examples by adding perturbations to the input and intervening on the casual factors.
Then, we augment recommendation systems by detecting potential attacks with a deep learning-based classifier based on the crafted data.
arXiv Detail & Related papers (2021-12-02T04:12:24Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Localized Uncertainty Attacks [9.36341602283533]
We present localized uncertainty attacks against deep learning models.
We create adversarial examples by perturbing only regions in the inputs where a classifier is uncertain.
Unlike $ell_p$ ball or functional attacks which perturb inputs indiscriminately, our targeted changes can be less perceptible.
arXiv Detail & Related papers (2021-06-17T03:07:22Z) - Towards Defending against Adversarial Examples via Attack-Invariant
Features [147.85346057241605]
Deep neural networks (DNNs) are vulnerable to adversarial noise.
adversarial robustness can be improved by exploiting adversarial examples.
Models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples.
arXiv Detail & Related papers (2021-06-09T12:49:54Z) - Learning and Certification under Instance-targeted Poisoning [49.55596073963654]
We study PAC learnability and certification under instance-targeted poisoning attacks.
We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable.
We empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets.
arXiv Detail & Related papers (2021-05-18T17:48:15Z) - Learning to Separate Clusters of Adversarial Representations for Robust
Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature.
In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property.
This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.