Related papers: Extending Adversarial Attacks to Produce Adversarial Class Probability Distributions

Extending Adversarial Attacks to Produce Adversarial Class Probability Distributions

URL: http://arxiv.org/abs/2004.06383v2
Date: Tue, 21 Sep 2021 23:25:22 GMT
Title: Extending Adversarial Attacks to Produce Adversarial Class Probability Distributions
Authors: Jon Vadillo, Roberto Santana and Jose A. Lozano
Abstract summary: We show that we can approximate any probability distribution for the classes while maintaining a high fooling rate. Our results demonstrate that we can closely approximate any probability distribution for the classes while maintaining a high fooling rate.
Score: 1.439518478021091
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Despite the remarkable performance and generalization levels of deep learning models in a wide range of artificial intelligence tasks, it has been demonstrated that these models can be easily fooled by the addition of imperceptible yet malicious perturbations to natural inputs. These altered inputs are known in the literature as adversarial examples. In this paper, we propose a novel probabilistic framework to generalize and extend adversarial attacks in order to produce a desired probability distribution for the classes when we apply the attack method to a large number of inputs. This novel attack strategy provides the attacker with greater control over the target model, and increases the complexity of detecting that the model is being systematically attacked. We introduce four different strategies to efficiently generate such attacks, and illustrate our approach by extending multiple adversarial attack algorithms. We also experimentally validate our approach for the spoken command classification task, an exemplary machine learning problem in the audio domain. Our results demonstrate that we can closely approximate any probability distribution for the classes while maintaining a high fooling rate and by injecting imperceptible perturbations to the inputs.

Related papers

Exploiting Edge Features for Transferable Adversarial Attacks in Distributed Machine Learning [54.26807397329468]
This work explores a previously overlooked vulnerability in distributed deep learning systems.<n>An adversary who intercepts the intermediate features transmitted between them can still pose a serious threat.<n>We propose an exploitation strategy specifically designed for distributed settings.
arXiv Detail & Related papers (2025-07-09T20:09:00Z)
ExploreADV: Towards exploratory attack for Neural Networks [0.33302293148249124]
ExploreADV is a general and flexible adversarial attack system that is capable of modeling regional and imperceptible attacks. We show that our system offers users good flexibility to focus on sub-regions of inputs, explore imperceptible perturbations and understand the vulnerability of pixels/regions to adversarial attacks.
arXiv Detail & Related papers (2023-01-01T07:17:03Z)
Adversarial Attacks are a Surprisingly Strong Baseline for Poisoning Few-Shot Meta-Learners [28.468089304148453]
We attack amortized meta-learners, which allows us to craft colluding sets of inputs that fool the system's learning algorithm. We show that in a white box setting, these attacks are very successful and can cause the target model's predictions to become worse than chance. We explore two hypotheses to explain this: 'overfitting' by the attack, and mismatch between the model on which the attack is generated and that to which the attack is transferred.
arXiv Detail & Related papers (2022-11-23T14:55:44Z)
Universal Distributional Decision-based Black-box Adversarial Attack with Reinforcement Learning [5.240772699480865]
We propose a pixel-wise decision-based attack algorithm that finds a distribution of adversarial perturbation through a reinforcement learning algorithm. Experiments show that the proposed approach outperforms state-of-the-art decision-based attacks with a higher attack success rate and greater transferability.
arXiv Detail & Related papers (2022-11-15T18:30:18Z)
Towards Generating Adversarial Examples on Mixed-type Data [32.41305735919529]
We propose a novel attack algorithm M-Attack, which can effectively generate adversarial examples in mixed-type data. Based on M-Attack, attackers can attempt to mislead the targeted classification model's prediction, by only slightly perturbing both the numerical and categorical features in the given data samples. Our generated adversarial examples can evade potential detection models, which makes the attack indeed insidious.
arXiv Detail & Related papers (2022-10-17T20:17:21Z)
Adversarial Robustness of Deep Reinforcement Learning based Dynamic Recommender Systems [50.758281304737444]
We propose to explore adversarial examples and attack detection on reinforcement learning-based interactive recommendation systems. We first craft different types of adversarial examples by adding perturbations to the input and intervening on the casual factors. Then, we augment recommendation systems by detecting potential attacks with a deep learning-based classifier based on the crafted data.
arXiv Detail & Related papers (2021-12-02T04:12:24Z)
Towards A Conceptually Simple Defensive Approach for Few-shot classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks. We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering. Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z)
Localized Uncertainty Attacks [9.36341602283533]
We present localized uncertainty attacks against deep learning models. We create adversarial examples by perturbing only regions in the inputs where a classifier is uncertain. Unlike $ell_p$ ball or functional attacks which perturb inputs indiscriminately, our targeted changes can be less perceptible.
arXiv Detail & Related papers (2021-06-17T03:07:22Z)
Towards Defending against Adversarial Examples via Attack-Invariant Features [147.85346057241605]
Deep neural networks (DNNs) are vulnerable to adversarial noise. adversarial robustness can be improved by exploiting adversarial examples. Models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples.
arXiv Detail & Related papers (2021-06-09T12:49:54Z)
Learning and Certification under Instance-targeted Poisoning [49.55596073963654]
We study PAC learnability and certification under instance-targeted poisoning attacks. We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable. We empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets.
arXiv Detail & Related papers (2021-05-18T17:48:15Z)
Learning to Separate Clusters of Adversarial Representations for Robust Adversarial Detection [50.03939695025513]
We propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature. In this paper, we consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property. This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
arXiv Detail & Related papers (2020-12-07T07:21:18Z)
Learning to Attack: Towards Textual Adversarial Attacking in Real-world Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples. We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.