Putting words into the system's mouth: A targeted attack on neural
machine translation using monolingual data poisoning
- URL: http://arxiv.org/abs/2107.05243v1
- Date: Mon, 12 Jul 2021 08:07:09 GMT
- Title: Putting words into the system's mouth: A targeted attack on neural
machine translation using monolingual data poisoning
- Authors: Jun Wang, Chang Xu, Francisco Guzman, Ahmed El-Kishky, Yuqing Tang,
Benjamin I. P. Rubinstein, Trevor Cohn
- Abstract summary: We propose a poisoning attack in which a malicious adversary inserts a small poisoned sample of monolingual text into the training set of a system trained using back-translation.
This sample is designed to induce a specific, targeted translation behaviour, such as peddling misinformation.
We present two methods for crafting poisoned examples, and show that only a tiny handful of instances, amounting to only 0.02% of the training set, is sufficient to enact a successful attack.
- Score: 50.67997309717586
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural machine translation systems are known to be vulnerable to adversarial
test inputs, however, as we show in this paper, these systems are also
vulnerable to training attacks. Specifically, we propose a poisoning attack in
which a malicious adversary inserts a small poisoned sample of monolingual text
into the training set of a system trained using back-translation. This sample
is designed to induce a specific, targeted translation behaviour, such as
peddling misinformation. We present two methods for crafting poisoned examples,
and show that only a tiny handful of instances, amounting to only 0.02% of the
training set, is sufficient to enact a successful attack. We outline a defence
method against said attacks, which partly ameliorates the problem. However, we
stress that this is a blind-spot in modern NMT, demanding immediate attention.
Related papers
- Rethinking Targeted Adversarial Attacks For Neural Machine Translation [56.10484905098989]
This paper presents a new setting for NMT targeted adversarial attacks that could lead to reliable attacking results.
Under the new setting, it then proposes a Targeted Word Gradient adversarial Attack (TWGA) method to craft adversarial examples.
Experimental results demonstrate that our proposed setting could provide faithful attacking results for targeted adversarial attacks on NMT systems.
arXiv Detail & Related papers (2024-07-07T10:16:06Z) - Targeted Adversarial Attacks against Neural Machine Translation [44.04452616807661]
We propose a new targeted adversarial attack against NMT models.
Our attack succeeds in inserting a keyword into the translation for more than 75% of sentences.
arXiv Detail & Related papers (2023-03-02T08:43:30Z) - Adversarial Attacks are a Surprisingly Strong Baseline for Poisoning
Few-Shot Meta-Learners [28.468089304148453]
We attack amortized meta-learners, which allows us to craft colluding sets of inputs that fool the system's learning algorithm.
We show that in a white box setting, these attacks are very successful and can cause the target model's predictions to become worse than chance.
We explore two hypotheses to explain this: 'overfitting' by the attack, and mismatch between the model on which the attack is generated and that to which the attack is transferred.
arXiv Detail & Related papers (2022-11-23T14:55:44Z) - Traceback of Data Poisoning Attacks in Neural Networks [24.571668412312196]
We describe our efforts in developing a forensic traceback tool for poison attacks on deep neural networks.
We propose a novel iterative clustering and pruning solution that trims "innocent" training samples.
We empirically demonstrate the efficacy of our system on three types of dirty-label (backdoor) poison attacks and three types of clean-label poison attacks.
arXiv Detail & Related papers (2021-10-13T17:39:18Z) - Adversarial Examples Make Strong Poisons [55.63469396785909]
We show that adversarial examples, originally intended for attacking pre-trained models, are even more effective for data poisoning than recent methods designed specifically for poisoning.
Our method, adversarial poisoning, is substantially more effective than existing poisoning methods for secure dataset release.
arXiv Detail & Related papers (2021-06-21T01:57:14Z) - A Targeted Attack on Black-Box Neural Machine Translation with Parallel
Data Poisoning [60.826628282900955]
We show that targeted attacks on black-box NMT systems are feasible, based on poisoning a small fraction of their parallel training data.
We show that this attack can be realised practically via targeted corruption of web documents crawled to form the system's training data.
Our results are alarming: even on the state-of-the-art systems trained with massive parallel data, the attacks are still successful (over 50% success rate) under surprisingly low poisoning budgets.
arXiv Detail & Related papers (2020-11-02T01:52:46Z) - Concealed Data Poisoning Attacks on NLP Models [56.794857982509455]
Adversarial attacks alter NLP model predictions by perturbing test-time inputs.
We develop a new data poisoning attack that allows an adversary to control model predictions whenever a desired trigger phrase is present in the input.
arXiv Detail & Related papers (2020-10-23T17:47:06Z) - Poison Attacks against Text Datasets with Conditional Adversarially
Regularized Autoencoder [78.01180944665089]
This paper demonstrates a fatal vulnerability in natural language inference (NLI) and text classification systems.
We present a 'backdoor poisoning' attack on NLP models.
arXiv Detail & Related papers (2020-10-06T13:03:49Z) - Attacking Neural Text Detectors [0.0]
This paper presents two classes of black-box attacks on neural text detectors.
The homoglyph and misspelling attacks decrease a popular neural text detector's recall on neural text from 97.44% to 0.26% and 22.68%, respectively.
Results also indicate that the attacks are transferable to other neural text detectors.
arXiv Detail & Related papers (2020-02-19T04:18:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.