Rethinking Targeted Adversarial Attacks For Neural Machine Translation
- URL: http://arxiv.org/abs/2407.05319v1
- Date: Sun, 7 Jul 2024 10:16:06 GMT
- Title: Rethinking Targeted Adversarial Attacks For Neural Machine Translation
- Authors: Junjie Wu, Lemao Liu, Wei Bi, Dit-Yan Yeung,
- Abstract summary: This paper presents a new setting for NMT targeted adversarial attacks that could lead to reliable attacking results.
Under the new setting, it then proposes a Targeted Word Gradient adversarial Attack (TWGA) method to craft adversarial examples.
Experimental results demonstrate that our proposed setting could provide faithful attacking results for targeted adversarial attacks on NMT systems.
- Score: 56.10484905098989
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Targeted adversarial attacks are widely used to evaluate the robustness of neural machine translation systems. Unfortunately, this paper first identifies a critical issue in the existing settings of NMT targeted adversarial attacks, where their attacking results are largely overestimated. To this end, this paper presents a new setting for NMT targeted adversarial attacks that could lead to reliable attacking results. Under the new setting, it then proposes a Targeted Word Gradient adversarial Attack (TWGA) method to craft adversarial examples. Experimental results demonstrate that our proposed setting could provide faithful attacking results for targeted adversarial attacks on NMT systems, and the proposed TWGA method can effectively attack such victim NMT systems. In-depth analyses on a large-scale dataset further illustrate some valuable findings. 1 Our code and data are available at https://github.com/wujunjie1998/TWGA.
Related papers
- A Classification-Guided Approach for Adversarial Attacks against Neural
Machine Translation [66.58025084857556]
We introduce ACT, a novel adversarial attack framework against NMT systems guided by a classifier.
In our attack, the adversary aims to craft meaning-preserving adversarial examples whose translations belong to a different class than the original translations.
To evaluate the robustness of NMT models to our attack, we propose enhancements to existing black-box word-replacement-based attacks.
arXiv Detail & Related papers (2023-08-29T12:12:53Z) - Targeted Adversarial Attacks against Neural Machine Translation [44.04452616807661]
We propose a new targeted adversarial attack against NMT models.
Our attack succeeds in inserting a keyword into the translation for more than 75% of sentences.
arXiv Detail & Related papers (2023-03-02T08:43:30Z) - Guidance Through Surrogate: Towards a Generic Diagnostic Attack [101.36906370355435]
We develop a guided mechanism to avoid local minima during attack optimization, leading to a novel attack dubbed Guided Projected Gradient Attack (G-PGA)
Our modified attack does not require random restarts, large number of attack iterations or search for an optimal step-size.
More than an effective attack, G-PGA can be used as a diagnostic tool to reveal elusive robustness due to gradient masking in adversarial defenses.
arXiv Detail & Related papers (2022-12-30T18:45:23Z) - Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage.
Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack.
We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z) - Evaluation of Neural Networks Defenses and Attacks using NDCG and
Reciprocal Rank Metrics [6.6389732792316]
We present two metrics which are specifically designed to measure the effect of attacks, or the recovery effect of defenses, on the output of neural networks in classification tasks.
Inspired by the normalized discounted cumulative gain and the reciprocal rank metrics used in information retrieval literature, we treat the neural network predictions as ranked lists of results.
Compared to the common classification metrics, our proposed metrics demonstrate superior informativeness and distinctiveness.
arXiv Detail & Related papers (2022-01-10T12:54:45Z) - A Targeted Attack on Black-Box Neural Machine Translation with Parallel
Data Poisoning [60.826628282900955]
We show that targeted attacks on black-box NMT systems are feasible, based on poisoning a small fraction of their parallel training data.
We show that this attack can be realised practically via targeted corruption of web documents crawled to form the system's training data.
Our results are alarming: even on the state-of-the-art systems trained with massive parallel data, the attacks are still successful (over 50% success rate) under surprisingly low poisoning budgets.
arXiv Detail & Related papers (2020-11-02T01:52:46Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z) - Challenging the adversarial robustness of DNNs based on error-correcting
output codes [33.46319608673487]
ECOC-based networks can be attacked quite easily by introducing a small adversarial perturbation.
adversarial examples can be generated in such a way to achieve high probabilities for the predicted target class.
arXiv Detail & Related papers (2020-03-26T12:14:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.