NMT-Obfuscator Attack: Ignore a sentence in translation with only one word
- URL: http://arxiv.org/abs/2411.12473v1
- Date: Tue, 19 Nov 2024 12:55:22 GMT
- Title: NMT-Obfuscator Attack: Ignore a sentence in translation with only one word
- Authors: Sahar Sadrizadeh, César Descalzo, Ljiljana Dolamic, Pascal Frossard,
- Abstract summary: We propose a new type of adversarial attack against NMT models.
Our attack can successfully force the NMT models to ignore the second part of the input for more than 50% of all cases.
- Score: 54.22817040379553
- License:
- Abstract: Neural Machine Translation systems are used in diverse applications due to their impressive performance. However, recent studies have shown that these systems are vulnerable to carefully crafted small perturbations to their inputs, known as adversarial attacks. In this paper, we propose a new type of adversarial attack against NMT models. In this attack, we find a word to be added between two sentences such that the second sentence is ignored and not translated by the NMT model. The word added between the two sentences is such that the whole adversarial text is natural in the source language. This type of attack can be harmful in practical scenarios since the attacker can hide malicious information in the automatic translation made by the target NMT model. Our experiments show that different NMT models and translation tasks are vulnerable to this type of attack. Our attack can successfully force the NMT models to ignore the second part of the input in the translation for more than 50% of all cases while being able to maintain low perplexity for the whole input.
Related papers
- A Classification-Guided Approach for Adversarial Attacks against Neural
Machine Translation [66.58025084857556]
We introduce ACT, a novel adversarial attack framework against NMT systems guided by a classifier.
In our attack, the adversary aims to craft meaning-preserving adversarial examples whose translations belong to a different class than the original translations.
To evaluate the robustness of NMT models to our attack, we propose enhancements to existing black-box word-replacement-based attacks.
arXiv Detail & Related papers (2023-08-29T12:12:53Z) - Targeted Adversarial Attacks against Neural Machine Translation [44.04452616807661]
We propose a new targeted adversarial attack against NMT models.
Our attack succeeds in inserting a keyword into the translation for more than 75% of sentences.
arXiv Detail & Related papers (2023-03-02T08:43:30Z) - TransFool: An Adversarial Attack against Neural Machine Translation
Models [49.50163349643615]
We investigate the vulnerability of Neural Machine Translation (NMT) models to adversarial attacks and propose a new attack algorithm called TransFool.
We generate fluent adversarial examples in the source language that maintain a high level of semantic similarity with the clean samples.
Based on automatic and human evaluations, TransFool leads to improvement in terms of success rate, semantic similarity, and fluency compared to the existing attacks.
arXiv Detail & Related papers (2023-02-02T08:35:34Z) - Putting words into the system's mouth: A targeted attack on neural
machine translation using monolingual data poisoning [50.67997309717586]
We propose a poisoning attack in which a malicious adversary inserts a small poisoned sample of monolingual text into the training set of a system trained using back-translation.
This sample is designed to induce a specific, targeted translation behaviour, such as peddling misinformation.
We present two methods for crafting poisoned examples, and show that only a tiny handful of instances, amounting to only 0.02% of the training set, is sufficient to enact a successful attack.
arXiv Detail & Related papers (2021-07-12T08:07:09Z) - A Targeted Attack on Black-Box Neural Machine Translation with Parallel
Data Poisoning [60.826628282900955]
We show that targeted attacks on black-box NMT systems are feasible, based on poisoning a small fraction of their parallel training data.
We show that this attack can be realised practically via targeted corruption of web documents crawled to form the system's training data.
Our results are alarming: even on the state-of-the-art systems trained with massive parallel data, the attacks are still successful (over 50% success rate) under surprisingly low poisoning budgets.
arXiv Detail & Related papers (2020-11-02T01:52:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.