Translate your gibberish: black-box adversarial attack on machine
translation systems
- URL: http://arxiv.org/abs/2303.10974v2
- Date: Tue, 23 May 2023 19:19:54 GMT
- Title: Translate your gibberish: black-box adversarial attack on machine
translation systems
- Authors: Andrei Chertkov, Olga Tsymboi, Mikhail Pautov, Ivan Oseledets
- Abstract summary: We present a simple approach to fool state-of-the-art machine translation tools in the task of translation from Russian to English and vice versa.
We show that many online translation tools, such as Google, DeepL, and Yandex, may both produce wrong or offensive translations for nonsensical adversarial input queries.
This vulnerability may interfere with understanding a new language and simply worsen the user's experience while using machine translation systems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks are deployed widely in natural language processing tasks on
the industrial scale, and perhaps the most often they are used as compounds of
automatic machine translation systems. In this work, we present a simple
approach to fool state-of-the-art machine translation tools in the task of
translation from Russian to English and vice versa. Using a novel black-box
gradient-free tensor-based optimizer, we show that many online translation
tools, such as Google, DeepL, and Yandex, may both produce wrong or offensive
translations for nonsensical adversarial input queries and refuse to translate
seemingly benign input phrases. This vulnerability may interfere with
understanding a new language and simply worsen the user's experience while
using machine translation systems, and, hence, additional improvements of these
tools are required to establish better translation.
Related papers
- Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - Multilingual Unsupervised Neural Machine Translation with Denoising
Adapters [77.80790405710819]
We consider the problem of multilingual unsupervised machine translation, translating to and from languages that only have monolingual data.
For this problem the standard procedure so far to leverage the monolingual data is back-translation, which is computationally costly and hard to tune.
In this paper we propose instead to use denoising adapters, adapter layers with a denoising objective, on top of pre-trained mBART-50.
arXiv Detail & Related papers (2021-10-20T10:18:29Z) - Using Machine Translation to Localize Task Oriented NLG Output [5.770385426429663]
This paper explores doing this by applying machine translation to the English output.
The required quality bar is close to perfection, the range of sentences is extremely narrow, and the sentences are often very different from the ones in the machine translation training data.
We are able to reach the required quality bar by building on existing ideas and adding new ones.
arXiv Detail & Related papers (2021-07-09T15:56:45Z) - SJTU-NICT's Supervised and Unsupervised Neural Machine Translation
Systems for the WMT20 News Translation Task [111.91077204077817]
We participated in four translation directions of three language pairs: English-Chinese, English-Polish, and German-Upper Sorbian.
Based on different conditions of language pairs, we have experimented with diverse neural machine translation (NMT) techniques.
In our submissions, the primary systems won the first place on English to Chinese, Polish to English, and German to Upper Sorbian translation directions.
arXiv Detail & Related papers (2020-10-11T00:40:05Z) - Robust Neural Machine Translation: Modeling Orthographic and
Interpunctual Variation [3.3194866396158]
We propose a simple generative noise model to generate adversarial examples of ten different types.
We show that, when tested on noisy data, systems trained using adversarial examples perform almost as well as when translating clean data.
arXiv Detail & Related papers (2020-09-11T14:12:54Z) - Learning to Detect Unacceptable Machine Translations for Downstream
Tasks [33.07594909221625]
We put machine translation in a cross-lingual pipeline and introduce downstream tasks to define task-specific acceptability of machine translations.
This allows us to leverage parallel data to automatically generate acceptability annotations on a large scale.
We conduct experiments to demonstrate the effectiveness of our framework for a range of downstream tasks and translation models.
arXiv Detail & Related papers (2020-05-08T09:37:19Z) - It's Easier to Translate out of English than into it: Measuring Neural
Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty.
XMI exploits the probabilistic nature of most neural machine translation models.
We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z) - Testing Machine Translation via Referential Transparency [28.931196266344926]
We introduce referentially transparent inputs (RTIs), a simple, widely applicable methodology for validating machine translation software.
Our practical implementation, Purity, detects when this property is broken by a translation.
To evaluate RTI, we use Purity to test Google Translate and Bing Microsoft Translator with 200 unlabeled sentences.
arXiv Detail & Related papers (2020-04-22T01:37:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.