Residue-Based Natural Language Adversarial Attack Detection
- URL: http://arxiv.org/abs/2204.10192v1
- Date: Sun, 17 Apr 2022 17:47:47 GMT
- Title: Residue-Based Natural Language Adversarial Attack Detection
- Authors: Vyas Raina and Mark Gales
- Abstract summary: This work proposes a simple sentence-embedding "residue" based detector to identify adversarial examples.
On many tasks, it out-performs ported image domain detectors and recent state of the art NLP specific detectors.
- Score: 1.4213973379473654
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning based systems are susceptible to adversarial attacks, where a
small, imperceptible change at the input alters the model prediction. However,
to date the majority of the approaches to detect these attacks have been
designed for image processing systems. Many popular image adversarial detection
approaches are able to identify adversarial examples from embedding feature
spaces, whilst in the NLP domain existing state of the art detection approaches
solely focus on input text features, without consideration of model embedding
spaces. This work examines what differences result when porting these image
designed strategies to Natural Language Processing (NLP) tasks - these
detectors are found to not port over well. This is expected as NLP systems have
a very different form of input: discrete and sequential in nature, rather than
the continuous and fixed size inputs for images. As an equivalent model-focused
NLP detection approach, this work proposes a simple sentence-embedding
"residue" based detector to identify adversarial examples. On many tasks, it
out-performs ported image domain detectors and recent state of the art NLP
specific detectors.
Related papers
- Sample-agnostic Adversarial Perturbation for Vision-Language Pre-training Models [7.350203999073509]
Recent studies on AI security have highlighted the vulnerability of Vision-Language Pre-training models to subtle yet intentionally designed perturbations in images and texts.
To the best of our knowledge, it is the first work through multimodal decision boundaries to explore the creation of a universal, sample-agnostic perturbation that applies to any image.
arXiv Detail & Related papers (2024-08-06T06:25:39Z) - Exploring the Adversarial Robustness of CLIP for AI-generated Image Detection [9.516391314161154]
We study the adversarial robustness of AI-generated image detectors, focusing on Contrastive Language-Image Pretraining (CLIP)-based methods.
CLIP-based detectors are found to be vulnerable to white-box attacks just like CNN-based detectors.
This analysis provides new insights into the properties of forensic detectors that can help to develop more effective strategies.
arXiv Detail & Related papers (2024-07-28T18:20:08Z) - Targeted Visualization of the Backbone of Encoder LLMs [46.453758431767724]
Attention based large language models (LLMs) are the state-of-the-art in natural language processing (NLP)
Despite the success of encoder models, on which we focus in this work, they also bear several risks, including issues with bias or their susceptibility for adversarial attacks.
We investigate the application of DeepView, a method for visualizing a part of the decision function together with a data set in two dimensions, to the NLP domain.
arXiv Detail & Related papers (2024-03-26T12:51:02Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - "That Is a Suspicious Reaction!": Interpreting Logits Variation to
Detect NLP Adversarial Attacks [0.2999888908665659]
Adversarial attacks are a major challenge faced by current machine learning research.
Our work presents a model-agnostic detector of adversarial text examples.
arXiv Detail & Related papers (2022-04-10T09:24:41Z) - Adversarially Robust One-class Novelty Detection [83.1570537254877]
We show that existing novelty detectors are susceptible to adversarial examples.
We propose a defense strategy that manipulates the latent space of novelty detectors to improve the robustness against adversarial examples.
arXiv Detail & Related papers (2021-08-25T10:41:29Z) - Exploiting Multi-Object Relationships for Detecting Adversarial Attacks
in Complex Scenes [51.65308857232767]
Vision systems that deploy Deep Neural Networks (DNNs) are known to be vulnerable to adversarial examples.
Recent research has shown that checking the intrinsic consistencies in the input data is a promising way to detect adversarial attacks.
We develop a novel approach to perform context consistency checks using language models.
arXiv Detail & Related papers (2021-08-19T00:52:10Z) - DAAIN: Detection of Anomalous and Adversarial Input using Normalizing
Flows [52.31831255787147]
We introduce a novel technique, DAAIN, to detect out-of-distribution (OOD) inputs and adversarial attacks (AA)
Our approach monitors the inner workings of a neural network and learns a density estimator of the activation distribution.
Our model can be trained on a single GPU making it compute efficient and deployable without requiring specialized accelerators.
arXiv Detail & Related papers (2021-05-30T22:07:13Z) - CSI: Novelty Detection via Contrastive Learning on Distributionally
Shifted Instances [77.28192419848901]
We propose a simple, yet effective method named contrasting shifted instances (CSI)
In addition to contrasting a given sample with other instances as in conventional contrastive learning methods, our training scheme contrasts the sample with distributionally-shifted augmentations of itself.
Our experiments demonstrate the superiority of our method under various novelty detection scenarios.
arXiv Detail & Related papers (2020-07-16T08:32:56Z) - Efficient detection of adversarial images [2.6249027950824506]
Some or all pixel values of an image are modified by an external attacker, so that the change is almost invisible to the human eye.
This paper first proposes a novel pre-processing technique that facilitates the detection of such modified images.
An adaptive version of this algorithm is proposed where a random number of perturbations are chosen adaptively.
arXiv Detail & Related papers (2020-07-09T05:35:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.