Related papers: AED: An black-box NLP classifier model attacker

AED: An black-box NLP classifier model attacker

URL: http://arxiv.org/abs/2112.11660v4
Date: Sun, 03 Nov 2024 01:47:36 GMT
Title: AED: An black-box NLP classifier model attacker
Authors: Yueyang Liu, Yan Huang, Zhipeng Cai,
Abstract summary: Deep Neural Networks (DNNs) have been successful in solving real-world tasks in domains such as connected and automated vehicles, disease, and job hiring. There is a growing concern regarding the potential bias and robustness of these DNN models. We propose a word-level NLP classifier attack model called "AED," which stands for Attention mechanism enabled post-model Explanation.
Score: 8.15167980163668
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Deep Neural Networks (DNNs) have been successful in solving real-world tasks in domains such as connected and automated vehicles, disease, and job hiring. However, their implications are far-reaching in critical application areas. Hence, there is a growing concern regarding the potential bias and robustness of these DNN models. A transparency and robust model is always demanded in high-stakes domains where reliability and safety are enforced, such as healthcare and finance. While most studies have focused on adversarial image attack scenarios, fewer studies have investigated the robustness of DNN models in natural language processing (NLP) due to their adversarial samples are difficult to generate. To address this gap, we propose a word-level NLP classifier attack model called "AED," which stands for Attention mechanism enabled post-model Explanation with Density peaks clustering algorithm for synonyms search and substitution. AED aims to test the robustness of NLP DNN models by interpretability their weaknesses and exploring alternative ways to optimize them. By identifying vulnerabilities and providing explanations, AED can help improve the reliability and safety of DNN models in critical application areas such as healthcare and automated transportation. Our experiment results demonstrate that compared with other existing models, AED can effectively generate adversarial examples that can fool the victim model while maintaining the original meaning of the input.

Related papers

Calibrated Adversarial Sampling: Multi-Armed Bandit-Guided Generalization Against Unforeseen Attacks [10.840475405540458]
Adversarial training (AT) has emerged as one of the most effective paradigms for enhancing the robustness of Deep Neural Networks (DNNs)<n>We propose an efficient fine-tuning method called Calibrated Adversarial Sampling (CAS) to address these issues.<n>Experiments on benchmark datasets show that CAS achieves superior overall robustness while maintaining high clean accuracy.
arXiv Detail & Related papers (2025-11-15T15:42:40Z)
Benchmarking the Spatial Robustness of DNNs via Natural and Adversarial Localized Corruptions [49.546479320670464]
This paper introduces specialized metrics for benchmarking the spatial robustness of segmentation models. We propose region-aware multi-attack adversarial analysis, a method that enables a deeper understanding of model robustness. The results reveal that models respond to these two types of threats differently.
arXiv Detail & Related papers (2025-04-02T11:37:39Z)
Quantitative Analysis of Deeply Quantized Tiny Neural Networks Robust to Adversarial Attacks [1.6975640673527588]
A drawback of Deep Neural Networks (DNNs) is their susceptibility to adversarial attacks. This paper presents the outcomes of a compact DNN model that exhibits resilience against both black-box and white-box adversarial attacks. It has achieved this resilience through training with the QKeras quantization-aware training framework.
arXiv Detail & Related papers (2025-03-12T00:34:25Z)
Do Spikes Protect Privacy? Investigating Black-Box Model Inversion Attacks in Spiking Neural Networks [0.0]
This work presents the first study of black-box Model Inversion (MI) attacks on Spiking Neural Networks (SNNs) We adapt a generative adversarial MI framework to the spiking domain by incorporating rate-based encoding for input transformation and decoding mechanisms for output interpretation. Our results show that SNNs exhibit significantly greater resistance to MI attacks than ANNs, as demonstrated by degraded reconstructions, increased instability in attack convergence, and overall reduced attack effectiveness across multiple evaluation metrics.
arXiv Detail & Related papers (2025-02-08T10:02:27Z)
Reformulation is All You Need: Addressing Malicious Text Features in DNNs [53.45564571192014]
We propose a unified and adaptive defense framework that is effective against both adversarial and backdoor attacks.<n>Our framework outperforms existing sample-oriented defense baselines across a diverse range of malicious textual features.
arXiv Detail & Related papers (2025-02-02T03:39:43Z)
Securing Graph Neural Networks in MLaaS: A Comprehensive Realization of Query-based Integrity Verification [68.86863899919358]
We introduce a groundbreaking approach to protect GNN models in Machine Learning from model-centric attacks. Our approach includes a comprehensive verification schema for GNN's integrity, taking into account both transductive and inductive GNNs. We propose a query-based verification technique, fortified with innovative node fingerprint generation algorithms.
arXiv Detail & Related papers (2023-12-13T03:17:05Z)
KNOW How to Make Up Your Mind! Adversarially Detecting and Alleviating Inconsistencies in Natural Language Explanations [52.33256203018764]
We leverage external knowledge bases to significantly improve on an existing adversarial attack for detecting inconsistent NLEs. We show that models with higher NLE quality do not necessarily generate fewer inconsistencies.
arXiv Detail & Related papers (2023-06-05T15:51:58Z)
gRoMA: a Tool for Measuring the Global Robustness of Deep Neural Networks [3.2228025627337864]
Deep neural networks (DNNs) are at the forefront of cutting-edge technology, and have been achieving remarkable performance in a variety of complex tasks. Their integration into safety-critical systems, such as in the aerospace or automotive domains, poses a significant challenge due to the threat of adversarial inputs. Here, we present gRoMA, an innovative and scalable tool that implements a probabilistic approach to measure the global categorial robustness of a DNN.
arXiv Detail & Related papers (2023-01-05T20:45:23Z)
Improving Interpretability via Regularization of Neural Activation Sensitivity [20.407987149443997]
State-of-the-art deep neural networks (DNNs) are highly effective at tackling many real-world tasks. They are susceptible to adversarial attacks and their opaqueness impedes users' trust in their output. We present a novel approach for improving the interpretability of DNNs based on regularization of neural activation sensitivity.
arXiv Detail & Related papers (2022-11-16T05:40:29Z)
Robust and Lossless Fingerprinting of Deep Neural Networks via Pooled Membership Inference [17.881686153284267]
Deep neural networks (DNNs) have already achieved great success in a lot of application areas and brought profound changes to our society. How to protect the intellectual property (IP) of DNNs against infringement is one of the most important yet very challenging topics. This paper proposes a novel technique called emphpooled membership inference (PMI) so as to protect the IP of the DNN models.
arXiv Detail & Related papers (2022-09-09T04:06:29Z)
Improved and Interpretable Defense to Transferred Adversarial Examples by Jacobian Norm with Selective Input Gradient Regularization [31.516568778193157]
Adversarial training (AT) is often adopted to improve the robustness of deep neural networks (DNNs) In this work, we propose an approach based on Jacobian norm and Selective Input Gradient Regularization (J- SIGR) Experiments demonstrate that the proposed J- SIGR confers improved robustness against transferred adversarial attacks, and we also show that the predictions from the neural network are easy to interpret.
arXiv Detail & Related papers (2022-07-09T01:06:41Z)
Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training. We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z)
Exploring Robustness of Unsupervised Domain Adaptation in Semantic Segmentation [74.05906222376608]
We propose adversarial self-supervision UDA (or ASSUDA) that maximizes the agreement between clean images and their adversarial examples by a contrastive loss in the output space. This paper is rooted in two observations: (i) the robustness of UDA methods in semantic segmentation remains unexplored, which pose a security concern in this field; and (ii) although commonly used self-supervision (e.g., rotation and jigsaw) benefits image tasks such as classification and recognition, they fail to provide the critical supervision signals that could learn discriminative representation for segmentation tasks.
arXiv Detail & Related papers (2021-05-23T01:50:44Z)
On the benefits of robust models in modulation recognition [53.391095789289736]
Deep Neural Networks (DNNs) using convolutional layers are state-of-the-art in many tasks in communications. In other domains, like image classification, DNNs have been shown to be vulnerable to adversarial perturbations. We propose a novel framework to test the robustness of current state-of-the-art models.
arXiv Detail & Related papers (2021-03-27T19:58:06Z)
Towards Robust Neural Networks via Orthogonal Diversity [30.77473391842894]
A series of methods represented by the adversarial training and its variants have proven as one of the most effective techniques in enhancing the Deep Neural Networks robustness. This paper proposes a novel defense that aims at augmenting the model in order to learn features that are adaptive to diverse inputs, including adversarial examples. In this way, the proposed DIO augments the model and enhances the robustness of DNN itself as the learned features can be corrected by these mutually-orthogonal paths.
arXiv Detail & Related papers (2020-10-23T06:40:56Z)
Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs) GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features. It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.