Related papers: Residue-Based Natural Language Adversarial Attack Detection

Residue-Based Natural Language Adversarial Attack Detection

URL: http://arxiv.org/abs/2204.10192v1
Date: Sun, 17 Apr 2022 17:47:47 GMT
Title: Residue-Based Natural Language Adversarial Attack Detection
Authors: Vyas Raina and Mark Gales
Abstract summary: This work proposes a simple sentence-embedding "residue" based detector to identify adversarial examples. On many tasks, it out-performs ported image domain detectors and recent state of the art NLP specific detectors.
Score: 1.4213973379473654
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning based systems are susceptible to adversarial attacks, where a small, imperceptible change at the input alters the model prediction. However, to date the majority of the approaches to detect these attacks have been designed for image processing systems. Many popular image adversarial detection approaches are able to identify adversarial examples from embedding feature spaces, whilst in the NLP domain existing state of the art detection approaches solely focus on input text features, without consideration of model embedding spaces. This work examines what differences result when porting these image designed strategies to Natural Language Processing (NLP) tasks - these detectors are found to not port over well. This is expected as NLP systems have a very different form of input: discrete and sequential in nature, rather than the continuous and fixed size inputs for images. As an equivalent model-focused NLP detection approach, this work proposes a simple sentence-embedding "residue" based detector to identify adversarial examples. On many tasks, it out-performs ported image domain detectors and recent state of the art NLP specific detectors.

Related papers

Exploring Gradient-Guided Masked Language Model to Detect Textual Adversarial Attacks [50.53590930588431]
adversarial examples pose serious threats to natural language processing systems. Recent studies suggest that adversarial texts deviate from the underlying manifold of normal texts, whereas masked language models can approximate the manifold of normal data. We first introduce Masked Language Model-based Detection (MLMD), leveraging mask unmask operations of the masked language modeling (MLM) objective.
arXiv Detail & Related papers (2025-04-08T14:10:57Z)
Sample-agnostic Adversarial Perturbation for Vision-Language Pre-training Models [7.350203999073509]
Recent studies on AI security have highlighted the vulnerability of Vision-Language Pre-training models to subtle yet intentionally designed perturbations in images and texts. To the best of our knowledge, it is the first work through multimodal decision boundaries to explore the creation of a universal, sample-agnostic perturbation that applies to any image.
arXiv Detail & Related papers (2024-08-06T06:25:39Z)
Exploring the Adversarial Robustness of CLIP for AI-generated Image Detection [9.516391314161154]
We study the adversarial robustness of AI-generated image detectors, focusing on Contrastive Language-Image Pretraining (CLIP)-based methods. CLIP-based detectors are found to be vulnerable to white-box attacks just like CNN-based detectors. This analysis provides new insights into the properties of forensic detectors that can help to develop more effective strategies.
arXiv Detail & Related papers (2024-07-28T18:20:08Z)
Targeted Visualization of the Backbone of Encoder LLMs [46.453758431767724]
Attention based large language models (LLMs) are the state-of-the-art in natural language processing (NLP) Despite the success of encoder models, on which we focus in this work, they also bear several risks, including issues with bias or their susceptibility for adversarial attacks. We investigate the application of DeepView, a method for visualizing a part of the decision function together with a data set in two dimensions, to the NLP domain.
arXiv Detail & Related papers (2024-03-26T12:51:02Z)
Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning. CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z)
"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks [0.2999888908665659]
Adversarial attacks are a major challenge faced by current machine learning research. Our work presents a model-agnostic detector of adversarial text examples.
arXiv Detail & Related papers (2022-04-10T09:24:41Z)
Adversarially Robust One-class Novelty Detection [83.1570537254877]
We show that existing novelty detectors are susceptible to adversarial examples. We propose a defense strategy that manipulates the latent space of novelty detectors to improve the robustness against adversarial examples.
arXiv Detail & Related papers (2021-08-25T10:41:29Z)
Exploiting Multi-Object Relationships for Detecting Adversarial Attacks in Complex Scenes [51.65308857232767]
Vision systems that deploy Deep Neural Networks (DNNs) are known to be vulnerable to adversarial examples. Recent research has shown that checking the intrinsic consistencies in the input data is a promising way to detect adversarial attacks. We develop a novel approach to perform context consistency checks using language models.
arXiv Detail & Related papers (2021-08-19T00:52:10Z)
DAAIN: Detection of Anomalous and Adversarial Input using Normalizing Flows [52.31831255787147]
We introduce a novel technique, DAAIN, to detect out-of-distribution (OOD) inputs and adversarial attacks (AA) Our approach monitors the inner workings of a neural network and learns a density estimator of the activation distribution. Our model can be trained on a single GPU making it compute efficient and deployable without requiring specialized accelerators.
arXiv Detail & Related papers (2021-05-30T22:07:13Z)
CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances [77.28192419848901]
We propose a simple, yet effective method named contrasting shifted instances (CSI) In addition to contrasting a given sample with other instances as in conventional contrastive learning methods, our training scheme contrasts the sample with distributionally-shifted augmentations of itself. Our experiments demonstrate the superiority of our method under various novelty detection scenarios.
arXiv Detail & Related papers (2020-07-16T08:32:56Z)
Efficient detection of adversarial images [2.6249027950824506]
Some or all pixel values of an image are modified by an external attacker, so that the change is almost invisible to the human eye. This paper first proposes a novel pre-processing technique that facilitates the detection of such modified images. An adaptive version of this algorithm is proposed where a random number of perturbations are chosen adaptively.
arXiv Detail & Related papers (2020-07-09T05:35:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.