A Black-box NLP Classifier Attacker
- URL: http://arxiv.org/abs/2112.11660v3
- Date: Sat, 1 Jul 2023 06:27:15 GMT
- Title: A Black-box NLP Classifier Attacker
- Authors: Yueyang Liu, Hunmin Lee, Zhipeng Cai
- Abstract summary: We propose a word-level NLP sentiment classifier attack model, which includes a self-attention mechanism-based word selection method and a greedy search algorithm for word substitution.
Our model achieves a higher attack success rate and more efficient than previous methods due to the efficient word selection algorithms are employed and minimized the word substitute number.
- Score: 5.177150961252542
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep neural networks have a wide range of applications in solving various
real-world tasks and have achieved satisfactory results, in domains such as
computer vision, image classification, and natural language processing.
Meanwhile, the security and robustness of neural networks have become
imperative, as diverse researches have shown the vulnerable aspects of neural
networks. Case in point, in Natural language processing tasks, the neural
network may be fooled by an attentively modified text, which has a high
similarity to the original one. As per previous research, most of the studies
are focused on the image domain; Different from image adversarial attacks, the
text is represented in a discrete sequence, traditional image attack methods
are not applicable in the NLP field. In this paper, we propose a word-level NLP
sentiment classifier attack model, which includes a self-attention
mechanism-based word selection method and a greedy search algorithm for word
substitution. We experiment with our attack model by attacking GRU and 1D-CNN
victim models on IMDB datasets. Experimental results demonstrate that our model
achieves a higher attack success rate and more efficient than previous methods
due to the efficient word selection algorithms are employed and minimized the
word substitute number. Also, our model is transferable, which can be used in
the image domain with several modifications.
Related papers
- Benchmarking the Spatial Robustness of DNNs via Natural and Adversarial Localized Corruptions [49.546479320670464]
This paper introduces specialized metrics for benchmarking the spatial robustness of segmentation models.
We propose region-aware multi-attack adversarial analysis, a method that enables a deeper understanding of model robustness.
The results reveal that models respond to these two types of threats differently.
arXiv Detail & Related papers (2025-04-02T11:37:39Z) - Quantitative Analysis of Deeply Quantized Tiny Neural Networks Robust to Adversarial Attacks [1.6975640673527588]
A drawback of Deep Neural Networks (DNNs) is their susceptibility to adversarial attacks.
This paper presents the outcomes of a compact DNN model that exhibits resilience against both black-box and white-box adversarial attacks.
It has achieved this resilience through training with the QKeras quantization-aware training framework.
arXiv Detail & Related papers (2025-03-12T00:34:25Z) - Do Spikes Protect Privacy? Investigating Black-Box Model Inversion Attacks in Spiking Neural Networks [0.0]
This work presents the first study of black-box Model Inversion (MI) attacks on Spiking Neural Networks (SNNs)
We adapt a generative adversarial MI framework to the spiking domain by incorporating rate-based encoding for input transformation and decoding mechanisms for output interpretation.
Our results show that SNNs exhibit significantly greater resistance to MI attacks than ANNs, as demonstrated by degraded reconstructions, increased instability in attack convergence, and overall reduced attack effectiveness across multiple evaluation metrics.
arXiv Detail & Related papers (2025-02-08T10:02:27Z) - Securing Graph Neural Networks in MLaaS: A Comprehensive Realization of Query-based Integrity Verification [68.86863899919358]
We introduce a groundbreaking approach to protect GNN models in Machine Learning from model-centric attacks.
Our approach includes a comprehensive verification schema for GNN's integrity, taking into account both transductive and inductive GNNs.
We propose a query-based verification technique, fortified with innovative node fingerprint generation algorithms.
arXiv Detail & Related papers (2023-12-13T03:17:05Z) - KNOW How to Make Up Your Mind! Adversarially Detecting and Alleviating
Inconsistencies in Natural Language Explanations [52.33256203018764]
We leverage external knowledge bases to significantly improve on an existing adversarial attack for detecting inconsistent NLEs.
We show that models with higher NLE quality do not necessarily generate fewer inconsistencies.
arXiv Detail & Related papers (2023-06-05T15:51:58Z) - gRoMA: a Tool for Measuring the Global Robustness of Deep Neural
Networks [3.2228025627337864]
Deep neural networks (DNNs) are at the forefront of cutting-edge technology, and have been achieving remarkable performance in a variety of complex tasks.
Their integration into safety-critical systems, such as in the aerospace or automotive domains, poses a significant challenge due to the threat of adversarial inputs.
Here, we present gRoMA, an innovative and scalable tool that implements a probabilistic approach to measure the global categorial robustness of a DNN.
arXiv Detail & Related papers (2023-01-05T20:45:23Z) - Improving Interpretability via Regularization of Neural Activation
Sensitivity [20.407987149443997]
State-of-the-art deep neural networks (DNNs) are highly effective at tackling many real-world tasks.
They are susceptible to adversarial attacks and their opaqueness impedes users' trust in their output.
We present a novel approach for improving the interpretability of DNNs based on regularization of neural activation sensitivity.
arXiv Detail & Related papers (2022-11-16T05:40:29Z) - Robust and Lossless Fingerprinting of Deep Neural Networks via Pooled
Membership Inference [17.881686153284267]
Deep neural networks (DNNs) have already achieved great success in a lot of application areas and brought profound changes to our society.
How to protect the intellectual property (IP) of DNNs against infringement is one of the most important yet very challenging topics.
This paper proposes a novel technique called emphpooled membership inference (PMI) so as to protect the IP of the DNN models.
arXiv Detail & Related papers (2022-09-09T04:06:29Z) - Improved and Interpretable Defense to Transferred Adversarial Examples
by Jacobian Norm with Selective Input Gradient Regularization [31.516568778193157]
Adversarial training (AT) is often adopted to improve the robustness of deep neural networks (DNNs)
In this work, we propose an approach based on Jacobian norm and Selective Input Gradient Regularization (J- SIGR)
Experiments demonstrate that the proposed J- SIGR confers improved robustness against transferred adversarial attacks, and we also show that the predictions from the neural network are easy to interpret.
arXiv Detail & Related papers (2022-07-09T01:06:41Z) - Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training.
We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z) - Exploring Robustness of Unsupervised Domain Adaptation in Semantic
Segmentation [74.05906222376608]
We propose adversarial self-supervision UDA (or ASSUDA) that maximizes the agreement between clean images and their adversarial examples by a contrastive loss in the output space.
This paper is rooted in two observations: (i) the robustness of UDA methods in semantic segmentation remains unexplored, which pose a security concern in this field; and (ii) although commonly used self-supervision (e.g., rotation and jigsaw) benefits image tasks such as classification and recognition, they fail to provide the critical supervision signals that could learn discriminative representation for segmentation tasks.
arXiv Detail & Related papers (2021-05-23T01:50:44Z) - On the benefits of robust models in modulation recognition [53.391095789289736]
Deep Neural Networks (DNNs) using convolutional layers are state-of-the-art in many tasks in communications.
In other domains, like image classification, DNNs have been shown to be vulnerable to adversarial perturbations.
We propose a novel framework to test the robustness of current state-of-the-art models.
arXiv Detail & Related papers (2021-03-27T19:58:06Z) - Towards Robust Neural Networks via Orthogonal Diversity [30.77473391842894]
A series of methods represented by the adversarial training and its variants have proven as one of the most effective techniques in enhancing the Deep Neural Networks robustness.
This paper proposes a novel defense that aims at augmenting the model in order to learn features that are adaptive to diverse inputs, including adversarial examples.
In this way, the proposed DIO augments the model and enhances the robustness of DNN itself as the learned features can be corrected by these mutually-orthogonal paths.
arXiv Detail & Related papers (2020-10-23T06:40:56Z) - Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs)
GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features.
It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.