Lost In Translation: Generating Adversarial Examples Robust to
Round-Trip Translation
- URL: http://arxiv.org/abs/2307.12520v1
- Date: Mon, 24 Jul 2023 04:29:43 GMT
- Title: Lost In Translation: Generating Adversarial Examples Robust to
Round-Trip Translation
- Authors: Neel Bhandari and Pin-Yu Chen
- Abstract summary: We present a comprehensive study on the robustness of current text adversarial attacks to round-trip translation.
We demonstrate that 6 state-of-the-art text-based adversarial attacks do not maintain their efficacy after round-trip translation.
We introduce an intervention-based solution to this problem, by integrating Machine Translation into the process of adversarial example generation.
- Score: 66.33340583035374
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language Models today provide a high accuracy across a large number of
downstream tasks. However, they remain susceptible to adversarial attacks,
particularly against those where the adversarial examples maintain considerable
similarity to the original text. Given the multilingual nature of text, the
effectiveness of adversarial examples across translations and how machine
translations can improve the robustness of adversarial examples remain largely
unexplored. In this paper, we present a comprehensive study on the robustness
of current text adversarial attacks to round-trip translation. We demonstrate
that 6 state-of-the-art text-based adversarial attacks do not maintain their
efficacy after round-trip translation. Furthermore, we introduce an
intervention-based solution to this problem, by integrating Machine Translation
into the process of adversarial example generation and demonstrating increased
robustness to round-trip translation. Our results indicate that finding
adversarial examples robust to translation can help identify the insufficiency
of language models that is common across languages, and motivate further
research into multilingual adversarial attacks.
Related papers
- Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods [0.0]
A text adversarial attack involves the deliberate manipulation of input text to mislead the predictions of the model.
BERT, BERT-on-BERT attack, and Fraud Bargain's Attack (FBA) are explored in this paper.
PWWS emerges as the most potent adversary, consistently outperforming other methods across multiple evaluation scenarios.
arXiv Detail & Related papers (2024-04-08T02:55:01Z) - A Generative Adversarial Attack for Multilingual Text Classifiers [10.993289209465129]
We propose an approach to fine-tune a multilingual paraphrase model with an adversarial objective.
The training objective incorporates a set of pre-trained models to ensure text quality and language consistency.
The experimental validation over two multilingual datasets and five languages has shown the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-01-16T10:14:27Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - Is Robustness Transferable across Languages in Multilingual Neural
Machine Translation? [45.04661608619081]
We investigate the transferability of robustness across different languages in multilingual neural machine translation.
Our findings demonstrate that the robustness gained in one translation direction can indeed transfer to other translation directions.
arXiv Detail & Related papers (2023-10-31T04:10:31Z) - Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - Machine Translation Models Stand Strong in the Face of Adversarial
Attacks [2.6862667248315386]
Our research focuses on the impact of adversarial attacks on sequence-to-sequence (seq2seq) models, specifically machine translation models.
We introduce algorithms that incorporate basic text perturbations and more advanced strategies, such as the gradient-based attack.
arXiv Detail & Related papers (2023-09-10T11:22:59Z) - Estimating the Adversarial Robustness of Attributions in Text with
Transformers [44.745873282080346]
We establish a novel definition of attribution robustness (AR) in text classification, based on Lipschitz continuity.
We then propose our novel TransformerExplanationAttack (TEA), a strong adversary that provides a tight estimation for attribution in text classification.
arXiv Detail & Related papers (2022-12-18T20:18:59Z) - Rethinking Textual Adversarial Defense for Pre-trained Language Models [79.18455635071817]
A literature review shows that pre-trained language models (PrLMs) are vulnerable to adversarial attacks.
We propose a novel metric (Degree of Anomaly) to enable current adversarial attack approaches to generate more natural and imperceptible adversarial examples.
We show that our universal defense framework achieves comparable or even higher after-attack accuracy with other specific defenses.
arXiv Detail & Related papers (2022-07-21T07:51:45Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - Detecting Word Sense Disambiguation Biases in Machine Translation for
Model-Agnostic Adversarial Attacks [84.61578555312288]
We introduce a method for the prediction of disambiguation errors based on statistical data properties.
We develop a simple adversarial attack strategy that minimally perturbs sentences in order to elicit disambiguation errors.
Our findings indicate that disambiguation robustness varies substantially between domains and that different models trained on the same data are vulnerable to different attacks.
arXiv Detail & Related papers (2020-11-03T17:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.