A Black-box NLP Classifier Attacker
- URL: http://arxiv.org/abs/2112.11660v3
- Date: Sat, 1 Jul 2023 06:27:15 GMT
- Title: A Black-box NLP Classifier Attacker
- Authors: Yueyang Liu, Hunmin Lee, Zhipeng Cai
- Abstract summary: We propose a word-level NLP sentiment classifier attack model, which includes a self-attention mechanism-based word selection method and a greedy search algorithm for word substitution.
Our model achieves a higher attack success rate and more efficient than previous methods due to the efficient word selection algorithms are employed and minimized the word substitute number.
- Score: 5.177150961252542
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep neural networks have a wide range of applications in solving various
real-world tasks and have achieved satisfactory results, in domains such as
computer vision, image classification, and natural language processing.
Meanwhile, the security and robustness of neural networks have become
imperative, as diverse researches have shown the vulnerable aspects of neural
networks. Case in point, in Natural language processing tasks, the neural
network may be fooled by an attentively modified text, which has a high
similarity to the original one. As per previous research, most of the studies
are focused on the image domain; Different from image adversarial attacks, the
text is represented in a discrete sequence, traditional image attack methods
are not applicable in the NLP field. In this paper, we propose a word-level NLP
sentiment classifier attack model, which includes a self-attention
mechanism-based word selection method and a greedy search algorithm for word
substitution. We experiment with our attack model by attacking GRU and 1D-CNN
victim models on IMDB datasets. Experimental results demonstrate that our model
achieves a higher attack success rate and more efficient than previous methods
due to the efficient word selection algorithms are employed and minimized the
word substitute number. Also, our model is transferable, which can be used in
the image domain with several modifications.
Related papers
- Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - AICAttack: Adversarial Image Captioning Attack with Attention-Based
Optimization [13.99541041673674]
We present a novel adversarial attack strategy, which we call AICAttack.
operating within a black-box attack scenario, our algorithm requires no access to the target model's architecture, parameters, or gradient information.
We demonstrate AICAttack's effectiveness through extensive experiments on benchmark datasets with multiple victim models.
arXiv Detail & Related papers (2024-02-19T08:27:23Z) - Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - Controlled Caption Generation for Images Through Adversarial Attacks [85.66266989600572]
We study adversarial examples for vision and language models, which typically adopt a Convolutional Neural Network (i.e., CNN) for image feature extraction and a Recurrent Neural Network (RNN) for caption generation.
In particular, we investigate attacks on the visual encoder's hidden layer that is fed to the subsequent recurrent network.
We propose a GAN-based algorithm for crafting adversarial examples for neural image captioning that mimics the internal representation of the CNN.
arXiv Detail & Related papers (2021-07-07T07:22:41Z) - BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by
Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks.
Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z) - Cross-modal Adversarial Reprogramming [12.467311480726702]
Recent works on adversarial reprogramming have shown that it is possible to repurpose neural networks for alternate tasks without modifying the network architecture or parameters.
We analyze the feasibility of adversarially repurposing image classification neural networks for Natural Language Processing (NLP) and other sequence classification tasks.
arXiv Detail & Related papers (2021-02-15T03:46:16Z) - Image Restoration by Deep Projected GSURE [115.57142046076164]
Ill-posed inverse problems appear in many image processing applications, such as deblurring and super-resolution.
We propose a new image restoration framework that is based on minimizing a loss function that includes a "projected-version" of the Generalized SteinUnbiased Risk Estimator (GSURE) and parameterization of the latent image by a CNN.
arXiv Detail & Related papers (2021-02-04T08:52:46Z) - Effect of Word Embedding Models on Hate and Offensive Speech Detection [1.7403133838762446]
We investigate the impact of both word embedding models and neural network architectures on the predictive accuracy.
We first train several word embedding models on a large-scale unlabelled Arabic text corpus.
For each detection task, we train several neural network classifiers using the pre-trained word embedding models.
This task yields a large number of various learned models, which allows conducting an exhaustive comparison.
arXiv Detail & Related papers (2020-11-23T02:43:45Z) - MixNet for Generalized Face Presentation Attack Detection [63.35297510471997]
We have proposed a deep learning-based network termed as textitMixNet to detect presentation attacks.
The proposed algorithm utilizes state-of-the-art convolutional neural network architectures and learns the feature mapping for each attack category.
arXiv Detail & Related papers (2020-10-25T23:01:13Z) - Defense of Word-level Adversarial Attacks via Random Substitution
Encoding [0.5964792400314836]
adversarial attacks against deep neural networks on computer vision tasks have spawned many new technologies that help protect models from avoiding false predictions.
Recently, word-level adversarial attacks on deep models of Natural Language Processing (NLP) tasks have also demonstrated strong power, e.g., fooling a sentiment classification neural network to make wrong decisions.
We propose a novel framework called Random Substitution RSE, which introduces a random substitution into the training process of original neural networks.
arXiv Detail & Related papers (2020-05-01T15:28:43Z) - Verification of Deep Convolutional Neural Networks Using ImageStars [10.44732293654293]
Convolutional Neural Networks (CNN) have redefined the state-of-the-art in many real-world applications.
CNNs are vulnerable to adversarial attacks, where slight changes to their inputs may lead to sharp changes in their output.
We describe a set-based framework that successfully deals with real-world CNNs, such as VGG16 and VGG19, that have high accuracy on ImageNet.
arXiv Detail & Related papers (2020-04-12T00:37:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.