TransFool: An Adversarial Attack against Neural Machine Translation
Models
- URL: http://arxiv.org/abs/2302.00944v2
- Date: Fri, 16 Jun 2023 13:24:15 GMT
- Title: TransFool: An Adversarial Attack against Neural Machine Translation
Models
- Authors: Sahar Sadrizadeh, Ljiljana Dolamic, Pascal Frossard
- Abstract summary: We investigate the vulnerability of Neural Machine Translation (NMT) models to adversarial attacks and propose a new attack algorithm called TransFool.
We generate fluent adversarial examples in the source language that maintain a high level of semantic similarity with the clean samples.
Based on automatic and human evaluations, TransFool leads to improvement in terms of success rate, semantic similarity, and fluency compared to the existing attacks.
- Score: 49.50163349643615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks have been shown to be vulnerable to small perturbations
of their inputs, known as adversarial attacks. In this paper, we investigate
the vulnerability of Neural Machine Translation (NMT) models to adversarial
attacks and propose a new attack algorithm called TransFool. To fool NMT
models, TransFool builds on a multi-term optimization problem and a gradient
projection step. By integrating the embedding representation of a language
model, we generate fluent adversarial examples in the source language that
maintain a high level of semantic similarity with the clean samples.
Experimental results demonstrate that, for different translation tasks and NMT
architectures, our white-box attack can severely degrade the translation
quality while the semantic similarity between the original and the adversarial
sentences stays high. Moreover, we show that TransFool is transferable to
unknown target models. Finally, based on automatic and human evaluations,
TransFool leads to improvement in terms of success rate, semantic similarity,
and fluency compared to the existing attacks both in white-box and black-box
settings. Thus, TransFool permits us to better characterize the vulnerability
of NMT models and outlines the necessity to design strong defense mechanisms
and more robust NMT systems for real-life applications.
Related papers
- NMT-Obfuscator Attack: Ignore a sentence in translation with only one word [54.22817040379553]
We propose a new type of adversarial attack against NMT models.
Our attack can successfully force the NMT models to ignore the second part of the input for more than 50% of all cases.
arXiv Detail & Related papers (2024-11-19T12:55:22Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - A Classification-Guided Approach for Adversarial Attacks against Neural
Machine Translation [66.58025084857556]
We introduce ACT, a novel adversarial attack framework against NMT systems guided by a classifier.
In our attack, the adversary aims to craft meaning-preserving adversarial examples whose translations belong to a different class than the original translations.
To evaluate the robustness of NMT models to our attack, we propose enhancements to existing black-box word-replacement-based attacks.
arXiv Detail & Related papers (2023-08-29T12:12:53Z) - A Relaxed Optimization Approach for Adversarial Attacks against Neural
Machine Translation Models [44.04452616807661]
We propose an optimization-based adversarial attack against Neural Machine Translation (NMT) models.
Experimental results show that our attack significantly degrades the translation quality of multiple NMT models.
Our attack outperforms the baselines in terms of success rate, similarity preservation, effect on translation quality, and token error rate.
arXiv Detail & Related papers (2023-06-14T13:13:34Z) - Targeted Adversarial Attacks against Neural Machine Translation [44.04452616807661]
We propose a new targeted adversarial attack against NMT models.
Our attack succeeds in inserting a keyword into the translation for more than 75% of sentences.
arXiv Detail & Related papers (2023-03-02T08:43:30Z) - Modeling Coverage for Non-Autoregressive Neural Machine Translation [9.173385214565451]
We propose a novel Coverage-NAT to model the coverage information directly by a token-level coverage iterative refinement mechanism and a sentence-level coverage agreement.
Experimental results on WMT14 En-De and WMT16 En-Ro translation tasks show that our method can alleviate those errors and achieve strong improvements over the baseline system.
arXiv Detail & Related papers (2021-04-24T07:33:23Z) - Enriching Non-Autoregressive Transformer with Syntactic and
SemanticStructures for Neural Machine Translation [54.864148836486166]
We propose to incorporate the explicit syntactic and semantic structures of languages into a non-autoregressive Transformer.
Our model achieves a significantly faster speed, as well as keeps the translation quality when compared with several state-of-the-art non-autoregressive models.
arXiv Detail & Related papers (2021-01-22T04:12:17Z) - Imitation Attacks and Defenses for Black-box Machine Translation Systems [86.92681013449682]
Black-box machine translation (MT) systems have high commercial value and errors can be costly.
We show that MT systems can be stolen by querying them with monolingual sentences and training models to imitate their outputs.
We propose a defense that modifies translation outputs in order to misdirect the optimization of imitation models.
arXiv Detail & Related papers (2020-04-30T17:56:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.