AED: An black-box NLP classifier model attacker
- URL: http://arxiv.org/abs/2112.11660v4
- Date: Sun, 03 Nov 2024 01:47:36 GMT
- Title: AED: An black-box NLP classifier model attacker
- Authors: Yueyang Liu, Yan Huang, Zhipeng Cai,
- Abstract summary: Deep Neural Networks (DNNs) have been successful in solving real-world tasks in domains such as connected and automated vehicles, disease, and job hiring.
There is a growing concern regarding the potential bias and robustness of these DNN models.
We propose a word-level NLP classifier attack model called "AED," which stands for Attention mechanism enabled post-model Explanation.
- Score: 8.15167980163668
- License:
- Abstract: Deep Neural Networks (DNNs) have been successful in solving real-world tasks in domains such as connected and automated vehicles, disease, and job hiring. However, their implications are far-reaching in critical application areas. Hence, there is a growing concern regarding the potential bias and robustness of these DNN models. A transparency and robust model is always demanded in high-stakes domains where reliability and safety are enforced, such as healthcare and finance. While most studies have focused on adversarial image attack scenarios, fewer studies have investigated the robustness of DNN models in natural language processing (NLP) due to their adversarial samples are difficult to generate. To address this gap, we propose a word-level NLP classifier attack model called "AED," which stands for Attention mechanism enabled post-model Explanation with Density peaks clustering algorithm for synonyms search and substitution. AED aims to test the robustness of NLP DNN models by interpretability their weaknesses and exploring alternative ways to optimize them. By identifying vulnerabilities and providing explanations, AED can help improve the reliability and safety of DNN models in critical application areas such as healthcare and automated transportation. Our experiment results demonstrate that compared with other existing models, AED can effectively generate adversarial examples that can fool the victim model while maintaining the original meaning of the input.
Related papers
- Securing Graph Neural Networks in MLaaS: A Comprehensive Realization of Query-based Integrity Verification [68.86863899919358]
We introduce a groundbreaking approach to protect GNN models in Machine Learning from model-centric attacks.
Our approach includes a comprehensive verification schema for GNN's integrity, taking into account both transductive and inductive GNNs.
We propose a query-based verification technique, fortified with innovative node fingerprint generation algorithms.
arXiv Detail & Related papers (2023-12-13T03:17:05Z) - KNOW How to Make Up Your Mind! Adversarially Detecting and Alleviating
Inconsistencies in Natural Language Explanations [52.33256203018764]
We leverage external knowledge bases to significantly improve on an existing adversarial attack for detecting inconsistent NLEs.
We show that models with higher NLE quality do not necessarily generate fewer inconsistencies.
arXiv Detail & Related papers (2023-06-05T15:51:58Z) - gRoMA: a Tool for Measuring the Global Robustness of Deep Neural
Networks [3.2228025627337864]
Deep neural networks (DNNs) are at the forefront of cutting-edge technology, and have been achieving remarkable performance in a variety of complex tasks.
Their integration into safety-critical systems, such as in the aerospace or automotive domains, poses a significant challenge due to the threat of adversarial inputs.
Here, we present gRoMA, an innovative and scalable tool that implements a probabilistic approach to measure the global categorial robustness of a DNN.
arXiv Detail & Related papers (2023-01-05T20:45:23Z) - Improving Interpretability via Regularization of Neural Activation
Sensitivity [20.407987149443997]
State-of-the-art deep neural networks (DNNs) are highly effective at tackling many real-world tasks.
They are susceptible to adversarial attacks and their opaqueness impedes users' trust in their output.
We present a novel approach for improving the interpretability of DNNs based on regularization of neural activation sensitivity.
arXiv Detail & Related papers (2022-11-16T05:40:29Z) - Robust and Lossless Fingerprinting of Deep Neural Networks via Pooled
Membership Inference [17.881686153284267]
Deep neural networks (DNNs) have already achieved great success in a lot of application areas and brought profound changes to our society.
How to protect the intellectual property (IP) of DNNs against infringement is one of the most important yet very challenging topics.
This paper proposes a novel technique called emphpooled membership inference (PMI) so as to protect the IP of the DNN models.
arXiv Detail & Related papers (2022-09-09T04:06:29Z) - Improved and Interpretable Defense to Transferred Adversarial Examples
by Jacobian Norm with Selective Input Gradient Regularization [31.516568778193157]
Adversarial training (AT) is often adopted to improve the robustness of deep neural networks (DNNs)
In this work, we propose an approach based on Jacobian norm and Selective Input Gradient Regularization (J- SIGR)
Experiments demonstrate that the proposed J- SIGR confers improved robustness against transferred adversarial attacks, and we also show that the predictions from the neural network are easy to interpret.
arXiv Detail & Related papers (2022-07-09T01:06:41Z) - Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training.
We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z) - Exploring Robustness of Unsupervised Domain Adaptation in Semantic
Segmentation [74.05906222376608]
We propose adversarial self-supervision UDA (or ASSUDA) that maximizes the agreement between clean images and their adversarial examples by a contrastive loss in the output space.
This paper is rooted in two observations: (i) the robustness of UDA methods in semantic segmentation remains unexplored, which pose a security concern in this field; and (ii) although commonly used self-supervision (e.g., rotation and jigsaw) benefits image tasks such as classification and recognition, they fail to provide the critical supervision signals that could learn discriminative representation for segmentation tasks.
arXiv Detail & Related papers (2021-05-23T01:50:44Z) - On the benefits of robust models in modulation recognition [53.391095789289736]
Deep Neural Networks (DNNs) using convolutional layers are state-of-the-art in many tasks in communications.
In other domains, like image classification, DNNs have been shown to be vulnerable to adversarial perturbations.
We propose a novel framework to test the robustness of current state-of-the-art models.
arXiv Detail & Related papers (2021-03-27T19:58:06Z) - Towards Robust Neural Networks via Orthogonal Diversity [30.77473391842894]
A series of methods represented by the adversarial training and its variants have proven as one of the most effective techniques in enhancing the Deep Neural Networks robustness.
This paper proposes a novel defense that aims at augmenting the model in order to learn features that are adaptive to diverse inputs, including adversarial examples.
In this way, the proposed DIO augments the model and enhances the robustness of DNN itself as the learned features can be corrected by these mutually-orthogonal paths.
arXiv Detail & Related papers (2020-10-23T06:40:56Z) - Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs)
GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features.
It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.