A Targeted Attack on Black-Box Neural Machine Translation with Parallel
Data Poisoning
- URL: http://arxiv.org/abs/2011.00675v2
- Date: Mon, 15 Feb 2021 05:10:33 GMT
- Title: A Targeted Attack on Black-Box Neural Machine Translation with Parallel
Data Poisoning
- Authors: Chang Xu, Jun Wang, Yuqing Tang, Francisco Guzman, Benjamin I. P.
Rubinstein, Trevor Cohn
- Abstract summary: We show that targeted attacks on black-box NMT systems are feasible, based on poisoning a small fraction of their parallel training data.
We show that this attack can be realised practically via targeted corruption of web documents crawled to form the system's training data.
Our results are alarming: even on the state-of-the-art systems trained with massive parallel data, the attacks are still successful (over 50% success rate) under surprisingly low poisoning budgets.
- Score: 60.826628282900955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As modern neural machine translation (NMT) systems have been widely deployed,
their security vulnerabilities require close scrutiny. Most recently, NMT
systems have been found vulnerable to targeted attacks which cause them to
produce specific, unsolicited, and even harmful translations. These attacks are
usually exploited in a white-box setting, where adversarial inputs causing
targeted translations are discovered for a known target system. However, this
approach is less viable when the target system is black-box and unknown to the
adversary (e.g., secured commercial systems). In this paper, we show that
targeted attacks on black-box NMT systems are feasible, based on poisoning a
small fraction of their parallel training data. We show that this attack can be
realised practically via targeted corruption of web documents crawled to form
the system's training data. We then analyse the effectiveness of the targeted
poisoning in two common NMT training scenarios: the from-scratch training and
the pre-train & fine-tune paradigm. Our results are alarming: even on the
state-of-the-art systems trained with massive parallel data (tens of millions),
the attacks are still successful (over 50% success rate) under surprisingly low
poisoning budgets (e.g., 0.006%). Lastly, we discuss potential defences to
counter such attacks.
Related papers
- Rethinking Targeted Adversarial Attacks For Neural Machine Translation [56.10484905098989]
This paper presents a new setting for NMT targeted adversarial attacks that could lead to reliable attacking results.
Under the new setting, it then proposes a Targeted Word Gradient adversarial Attack (TWGA) method to craft adversarial examples.
Experimental results demonstrate that our proposed setting could provide faithful attacking results for targeted adversarial attacks on NMT systems.
arXiv Detail & Related papers (2024-07-07T10:16:06Z) - A Classification-Guided Approach for Adversarial Attacks against Neural
Machine Translation [66.58025084857556]
We introduce ACT, a novel adversarial attack framework against NMT systems guided by a classifier.
In our attack, the adversary aims to craft meaning-preserving adversarial examples whose translations belong to a different class than the original translations.
To evaluate the robustness of NMT models to our attack, we propose enhancements to existing black-box word-replacement-based attacks.
arXiv Detail & Related papers (2023-08-29T12:12:53Z) - Adversarial Attacks are a Surprisingly Strong Baseline for Poisoning
Few-Shot Meta-Learners [28.468089304148453]
We attack amortized meta-learners, which allows us to craft colluding sets of inputs that fool the system's learning algorithm.
We show that in a white box setting, these attacks are very successful and can cause the target model's predictions to become worse than chance.
We explore two hypotheses to explain this: 'overfitting' by the attack, and mismatch between the model on which the attack is generated and that to which the attack is transferred.
arXiv Detail & Related papers (2022-11-23T14:55:44Z) - Zero-Query Transfer Attacks on Context-Aware Object Detectors [95.18656036716972]
Adversarial attacks perturb images such that a deep neural network produces incorrect classification results.
A promising approach to defend against adversarial attacks on natural multi-object scenes is to impose a context-consistency check.
We present the first approach for generating context-consistent adversarial attacks that can evade the context-consistency check.
arXiv Detail & Related papers (2022-03-29T04:33:06Z) - Traceback of Data Poisoning Attacks in Neural Networks [24.571668412312196]
We describe our efforts in developing a forensic traceback tool for poison attacks on deep neural networks.
We propose a novel iterative clustering and pruning solution that trims "innocent" training samples.
We empirically demonstrate the efficacy of our system on three types of dirty-label (backdoor) poison attacks and three types of clean-label poison attacks.
arXiv Detail & Related papers (2021-10-13T17:39:18Z) - Putting words into the system's mouth: A targeted attack on neural
machine translation using monolingual data poisoning [50.67997309717586]
We propose a poisoning attack in which a malicious adversary inserts a small poisoned sample of monolingual text into the training set of a system trained using back-translation.
This sample is designed to induce a specific, targeted translation behaviour, such as peddling misinformation.
We present two methods for crafting poisoned examples, and show that only a tiny handful of instances, amounting to only 0.02% of the training set, is sufficient to enact a successful attack.
arXiv Detail & Related papers (2021-07-12T08:07:09Z) - Data Poisoning Attacks on Regression Learning and Corresponding Defenses [0.0]
Adversarial data poisoning is an effective attack against machine learning and threatens model integrity by introducing poisoned data into the training dataset.
We present realistic scenarios in which data poisoning attacks threaten production systems and introduce a novel black-box attack.
As a result, we observe that the mean squared error (MSE) of the regressor increases to 150 percent due to inserting only two percent of poison samples.
arXiv Detail & Related papers (2020-09-15T12:14:54Z) - Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [56.280018325419896]
Data Poisoning attacks modify training data to maliciously control a model trained on such data.
We analyze a particularly malicious poisoning attack that is both "from scratch" and "clean label"
We show that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
arXiv Detail & Related papers (2020-09-04T16:17:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.