A Classification-Guided Approach for Adversarial Attacks against Neural
Machine Translation
- URL: http://arxiv.org/abs/2308.15246v2
- Date: Thu, 22 Feb 2024 09:27:09 GMT
- Title: A Classification-Guided Approach for Adversarial Attacks against Neural
Machine Translation
- Authors: Sahar Sadrizadeh, Ljiljana Dolamic, Pascal Frossard
- Abstract summary: We introduce ACT, a novel adversarial attack framework against NMT systems guided by a classifier.
In our attack, the adversary aims to craft meaning-preserving adversarial examples whose translations belong to a different class than the original translations.
To evaluate the robustness of NMT models to our attack, we propose enhancements to existing black-box word-replacement-based attacks.
- Score: 66.58025084857556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural Machine Translation (NMT) models have been shown to be vulnerable to
adversarial attacks, wherein carefully crafted perturbations of the input can
mislead the target model. In this paper, we introduce ACT, a novel adversarial
attack framework against NMT systems guided by a classifier. In our attack, the
adversary aims to craft meaning-preserving adversarial examples whose
translations in the target language by the NMT model belong to a different
class than the original translations. Unlike previous attacks, our new approach
has a more substantial effect on the translation by altering the overall
meaning, which then leads to a different class determined by an oracle
classifier. To evaluate the robustness of NMT models to our attack, we propose
enhancements to existing black-box word-replacement-based attacks by
incorporating output translations of the target NMT model and the output logits
of a classifier within the attack process. Extensive experiments, including a
comparison with existing untargeted attacks, show that our attack is
considerably more successful in altering the class of the output translation
and has more effect on the translation. This new paradigm can reveal the
vulnerabilities of NMT systems by focusing on the class of translation rather
than the mere translation quality as studied traditionally.
Related papers
- Rethinking Targeted Adversarial Attacks For Neural Machine Translation [56.10484905098989]
This paper presents a new setting for NMT targeted adversarial attacks that could lead to reliable attacking results.
Under the new setting, it then proposes a Targeted Word Gradient adversarial Attack (TWGA) method to craft adversarial examples.
Experimental results demonstrate that our proposed setting could provide faithful attacking results for targeted adversarial attacks on NMT systems.
arXiv Detail & Related papers (2024-07-07T10:16:06Z) - Machine Translation Models Stand Strong in the Face of Adversarial
Attacks [2.6862667248315386]
Our research focuses on the impact of adversarial attacks on sequence-to-sequence (seq2seq) models, specifically machine translation models.
We introduce algorithms that incorporate basic text perturbations and more advanced strategies, such as the gradient-based attack.
arXiv Detail & Related papers (2023-09-10T11:22:59Z) - Targeted Adversarial Attacks against Neural Machine Translation [44.04452616807661]
We propose a new targeted adversarial attack against NMT models.
Our attack succeeds in inserting a keyword into the translation for more than 75% of sentences.
arXiv Detail & Related papers (2023-03-02T08:43:30Z) - TransFool: An Adversarial Attack against Neural Machine Translation
Models [49.50163349643615]
We investigate the vulnerability of Neural Machine Translation (NMT) models to adversarial attacks and propose a new attack algorithm called TransFool.
We generate fluent adversarial examples in the source language that maintain a high level of semantic similarity with the clean samples.
Based on automatic and human evaluations, TransFool leads to improvement in terms of success rate, semantic similarity, and fluency compared to the existing attacks.
arXiv Detail & Related papers (2023-02-02T08:35:34Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z) - Adv-OLM: Generating Textual Adversaries via OLM [2.1012672709024294]
We present Adv-OLM, a black-box attack method that adapts the idea of Occlusion and Language Models (OLM) to the current state of the art attack methods.
We experimentally show that our approach outperforms other attack methods for several text classification tasks.
arXiv Detail & Related papers (2021-01-21T10:04:56Z) - Boosting Black-Box Attack with Partially Transferred Conditional
Adversarial Distribution [83.02632136860976]
We study black-box adversarial attacks against deep neural networks (DNNs)
We develop a novel mechanism of adversarial transferability, which is robust to the surrogate biases.
Experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
arXiv Detail & Related papers (2020-06-15T16:45:27Z) - Imitation Attacks and Defenses for Black-box Machine Translation Systems [86.92681013449682]
Black-box machine translation (MT) systems have high commercial value and errors can be costly.
We show that MT systems can be stolen by querying them with monolingual sentences and training models to imitate their outputs.
We propose a defense that modifies translation outputs in order to misdirect the optimization of imitation models.
arXiv Detail & Related papers (2020-04-30T17:56:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.