Adversarial Text Normalization
- URL: http://arxiv.org/abs/2206.04137v1
- Date: Wed, 8 Jun 2022 19:44:03 GMT
- Title: Adversarial Text Normalization
- Authors: Joanna Bitton and Maya Pavlova and Ivan Evtimov
- Abstract summary: Adversarial Text Normalizer restores baseline performance on attacked content with low computational overhead.
We find that text normalization provides a task-agnostic defense against character-level attacks.
- Score: 2.9434930072968584
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-based adversarial attacks are becoming more commonplace and accessible
to general internet users. As these attacks proliferate, the need to address
the gap in model robustness becomes imminent. While retraining on adversarial
data may increase performance, there remains an additional class of
character-level attacks on which these models falter. Additionally, the process
to retrain a model is time and resource intensive, creating a need for a
lightweight, reusable defense. In this work, we propose the Adversarial Text
Normalizer, a novel method that restores baseline performance on attacked
content with low computational overhead. We evaluate the efficacy of the
normalizer on two problem areas prone to adversarial attacks, i.e. Hate Speech
and Natural Language Inference. We find that text normalization provides a
task-agnostic defense against character-level attacks that can be implemented
supplementary to adversarial retraining solutions, which are more suited for
semantic alterations.
Related papers
- A Realistic Threat Model for Large Language Model Jailbreaks [87.64278063236847]
In this work, we propose a unified threat model for the principled comparison of jailbreak attacks.
Our threat model combines constraints in perplexity, measuring how far a jailbreak deviates from natural text.
We adapt popular attacks to this new, realistic threat model, with which we, for the first time, benchmark these attacks on equal footing.
arXiv Detail & Related papers (2024-10-21T17:27:01Z) - GenFighter: A Generative and Evolutive Textual Attack Removal [6.044610337297754]
Adrial attacks pose significant challenges to deep neural networks (DNNs) such as Transformer models in natural language processing (NLP)
This paper introduces a novel defense strategy, called GenFighter, which enhances adversarial robustness by learning and reasoning on the training classification distribution.
We show that GenFighter outperforms state-of-the-art defenses in accuracy under attack and attack success rate metrics.
arXiv Detail & Related papers (2024-04-17T16:32:13Z) - Don't be a Fool: Pooling Strategies in Offensive Language Detection from User-Intended Adversarial Attacks [7.480124826347168]
Malicious users often attempt to avoid filtering systems through the involvement of textual noises.
We propose these evasions as user-intended adversarial attacks that insert special symbols or leverage the distinctive features of the Korean language.
We introduce simple yet effective pooling strategies in a layer-wise manner to defend against the proposed attacks.
arXiv Detail & Related papers (2024-03-20T06:28:09Z) - Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme.
Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z) - Fooling the Textual Fooler via Randomizing Latent Representations [13.77424820701913]
adversarial word-level perturbations are well-studied and effective attack strategies.
We propose a lightweight and attack-agnostic defense whose main goal is to perplex the process of generating an adversarial example.
We empirically demonstrate near state-of-the-art robustness of AdvFooler against representative adversarial word-level attacks.
arXiv Detail & Related papers (2023-10-02T06:57:25Z) - Preserving Semantics in Textual Adversarial Attacks [0.0]
Up to 70% of adversarial examples generated by adversarial attacks should be discarded because they do not preserve semantics.
We propose a new, fully supervised sentence embedding technique called Semantics-Preserving-Encoder (SPE)
Our method outperforms existing sentence encoders used in adversarial attacks by achieving 1.2x - 5.1x better real attack success rate.
arXiv Detail & Related papers (2022-11-08T12:40:07Z) - Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label.
Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm.
Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z) - Universal Adversarial Attacks with Natural Triggers for Text
Classification [30.74579821832117]
We develop adversarial attacks that appear closer to natural English phrases and yet confuse classification systems.
Our attacks effectively reduce model accuracy on classification tasks while being less identifiable than prior models.
arXiv Detail & Related papers (2020-05-01T01:58:24Z) - Adversarial Augmentation Policy Search for Domain and Cross-Lingual
Generalization in Reading Comprehension [96.62963688510035]
Reading comprehension models often overfit to nuances of training datasets and fail at adversarial evaluation.
We present several effective adversaries and automated data augmentation policy search methods with the goal of making reading comprehension models more robust to adversarial evaluation.
arXiv Detail & Related papers (2020-04-13T17:20:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.