Targeted Adversarial Attacks against Neural Machine Translation
- URL: http://arxiv.org/abs/2303.01068v1
- Date: Thu, 2 Mar 2023 08:43:30 GMT
- Title: Targeted Adversarial Attacks against Neural Machine Translation
- Authors: Sahar Sadrizadeh, AmirHossein Dabiri Aghdam, Ljiljana Dolamic, Pascal
Frossard
- Abstract summary: We propose a new targeted adversarial attack against NMT models.
Our attack succeeds in inserting a keyword into the translation for more than 75% of sentences.
- Score: 44.04452616807661
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural Machine Translation (NMT) systems are used in various applications.
However, it has been shown that they are vulnerable to very small perturbations
of their inputs, known as adversarial attacks. In this paper, we propose a new
targeted adversarial attack against NMT models. In particular, our goal is to
insert a predefined target keyword into the translation of the adversarial
sentence while maintaining similarity between the original sentence and the
perturbed one in the source domain. To this aim, we propose an optimization
problem, including an adversarial loss term and a similarity term. We use
gradient projection in the embedding space to craft an adversarial sentence.
Experimental results show that our attack outperforms Seq2Sick, the other
targeted adversarial attack against NMT models, in terms of success rate and
decrease in translation quality. Our attack succeeds in inserting a keyword
into the translation for more than 75% of sentences while similarity with the
original sentence stays preserved.
Related papers
- Rethinking Targeted Adversarial Attacks For Neural Machine Translation [56.10484905098989]
This paper presents a new setting for NMT targeted adversarial attacks that could lead to reliable attacking results.
Under the new setting, it then proposes a Targeted Word Gradient adversarial Attack (TWGA) method to craft adversarial examples.
Experimental results demonstrate that our proposed setting could provide faithful attacking results for targeted adversarial attacks on NMT systems.
arXiv Detail & Related papers (2024-07-07T10:16:06Z) - A Classification-Guided Approach for Adversarial Attacks against Neural
Machine Translation [66.58025084857556]
We introduce ACT, a novel adversarial attack framework against NMT systems guided by a classifier.
In our attack, the adversary aims to craft meaning-preserving adversarial examples whose translations belong to a different class than the original translations.
To evaluate the robustness of NMT models to our attack, we propose enhancements to existing black-box word-replacement-based attacks.
arXiv Detail & Related papers (2023-08-29T12:12:53Z) - A Relaxed Optimization Approach for Adversarial Attacks against Neural
Machine Translation Models [44.04452616807661]
We propose an optimization-based adversarial attack against Neural Machine Translation (NMT) models.
Experimental results show that our attack significantly degrades the translation quality of multiple NMT models.
Our attack outperforms the baselines in terms of success rate, similarity preservation, effect on translation quality, and token error rate.
arXiv Detail & Related papers (2023-06-14T13:13:34Z) - TransFool: An Adversarial Attack against Neural Machine Translation
Models [49.50163349643615]
We investigate the vulnerability of Neural Machine Translation (NMT) models to adversarial attacks and propose a new attack algorithm called TransFool.
We generate fluent adversarial examples in the source language that maintain a high level of semantic similarity with the clean samples.
Based on automatic and human evaluations, TransFool leads to improvement in terms of success rate, semantic similarity, and fluency compared to the existing attacks.
arXiv Detail & Related papers (2023-02-02T08:35:34Z) - Block-Sparse Adversarial Attack to Fool Transformer-Based Text
Classifiers [49.50163349643615]
In this paper, we propose a gradient-based adversarial attack against transformer-based text classifiers.
Experimental results demonstrate that, while our adversarial attack maintains the semantics of the sentence, it can reduce the accuracy of GPT-2 to less than 5%.
arXiv Detail & Related papers (2022-03-11T14:37:41Z) - Putting words into the system's mouth: A targeted attack on neural
machine translation using monolingual data poisoning [50.67997309717586]
We propose a poisoning attack in which a malicious adversary inserts a small poisoned sample of monolingual text into the training set of a system trained using back-translation.
This sample is designed to induce a specific, targeted translation behaviour, such as peddling misinformation.
We present two methods for crafting poisoned examples, and show that only a tiny handful of instances, amounting to only 0.02% of the training set, is sufficient to enact a successful attack.
arXiv Detail & Related papers (2021-07-12T08:07:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.