On the Transferability of Adversarial Attacksagainst Neural Text
Classifier
- URL: http://arxiv.org/abs/2011.08558v3
- Date: Wed, 22 Sep 2021 02:35:08 GMT
- Title: On the Transferability of Adversarial Attacksagainst Neural Text
Classifier
- Authors: Liping Yuan, Xiaoqing Zheng, Yi Zhou, Cho-Jui Hsieh, Kai-wei Chang
- Abstract summary: We investigate the transferability of adversarial examples for text classification models.
We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models.
We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
- Score: 121.6758865857686
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks are vulnerable to adversarial attacks, where a small
perturbation to an input alters the model prediction. In many cases, malicious
inputs intentionally crafted for one model can fool another model. In this
paper, we present the first study to systematically investigate the
transferability of adversarial examples for text classification models and
explore how various factors, including network architecture, tokenization
scheme, word embedding, and model capacity, affect the transferability of
adversarial examples. Based on these studies, we propose a genetic algorithm to
find an ensemble of models that can be used to induce adversarial examples to
fool almost all existing models. Such adversarial examples reflect the defects
of the learning process and the data bias in the training set. Finally, we
derive word replacement rules that can be used for model diagnostics from these
adversarial examples.
Related papers
- NaturalAdversaries: Can Naturalistic Adversaries Be as Effective as
Artificial Adversaries? [61.58261351116679]
We introduce a two-stage adversarial example generation framework (NaturalAdversaries) for natural language understanding tasks.
It is adaptable to both black-box and white-box adversarial attacks based on the level of access to the model parameters.
Our results indicate these adversaries generalize across domains, and offer insights for future research on improving robustness of neural text classification models.
arXiv Detail & Related papers (2022-11-08T16:37:34Z) - Towards Generating Adversarial Examples on Mixed-type Data [32.41305735919529]
We propose a novel attack algorithm M-Attack, which can effectively generate adversarial examples in mixed-type data.
Based on M-Attack, attackers can attempt to mislead the targeted classification model's prediction, by only slightly perturbing both the numerical and categorical features in the given data samples.
Our generated adversarial examples can evade potential detection models, which makes the attack indeed insidious.
arXiv Detail & Related papers (2022-10-17T20:17:21Z) - Robust Transferable Feature Extractors: Learning to Defend Pre-Trained
Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors.
We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z) - Adversarial Examples on Segmentation Models Can be Easy to Transfer [21.838878497660353]
The transferability of adversarial examples on classification models has attracted a growing interest.
We study the overfitting phenomenon of adversarial examples on classification and segmentation models.
We propose a simple and effective method, dubbed dynamic scaling, to overcome the limitation.
arXiv Detail & Related papers (2021-11-22T17:26:21Z) - When and How to Fool Explainable Models (and Humans) with Adversarial
Examples [1.439518478021091]
We explore the possibilities and limits of adversarial attacks for explainable machine learning models.
First, we extend the notion of adversarial examples to fit in explainable machine learning scenarios.
Next, we propose a comprehensive framework to study whether adversarial examples can be generated for explainable models.
arXiv Detail & Related papers (2021-07-05T11:20:55Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - Firearm Detection via Convolutional Neural Networks: Comparing a
Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents.
One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis.
We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z) - Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial
Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification.
This paper studies a complementary failure mode, invariance-based adversarial examples.
We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.