Related papers: TransFool: An Adversarial Attack against Neural Machine Translation Models

TransFool: An Adversarial Attack against Neural Machine Translation Models

URL: http://arxiv.org/abs/2302.00944v2
Date: Fri, 16 Jun 2023 13:24:15 GMT
Title: TransFool: An Adversarial Attack against Neural Machine Translation Models
Authors: Sahar Sadrizadeh, Ljiljana Dolamic, Pascal Frossard
Abstract summary: We investigate the vulnerability of Neural Machine Translation (NMT) models to adversarial attacks and propose a new attack algorithm called TransFool. We generate fluent adversarial examples in the source language that maintain a high level of semantic similarity with the clean samples. Based on automatic and human evaluations, TransFool leads to improvement in terms of success rate, semantic similarity, and fluency compared to the existing attacks.
Score: 49.50163349643615
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks have been shown to be vulnerable to small perturbations of their inputs, known as adversarial attacks. In this paper, we investigate the vulnerability of Neural Machine Translation (NMT) models to adversarial attacks and propose a new attack algorithm called TransFool. To fool NMT models, TransFool builds on a multi-term optimization problem and a gradient projection step. By integrating the embedding representation of a language model, we generate fluent adversarial examples in the source language that maintain a high level of semantic similarity with the clean samples. Experimental results demonstrate that, for different translation tasks and NMT architectures, our white-box attack can severely degrade the translation quality while the semantic similarity between the original and the adversarial sentences stays high. Moreover, we show that TransFool is transferable to unknown target models. Finally, based on automatic and human evaluations, TransFool leads to improvement in terms of success rate, semantic similarity, and fluency compared to the existing attacks both in white-box and black-box settings. Thus, TransFool permits us to better characterize the vulnerability of NMT models and outlines the necessity to design strong defense mechanisms and more robust NMT systems for real-life applications.

Related papers

NMT-Obfuscator Attack: Ignore a sentence in translation with only one word [54.22817040379553]
We propose a new type of adversarial attack against NMT models. Our attack can successfully force the NMT models to ignore the second part of the input for more than 50% of all cases.
arXiv Detail & Related papers (2024-11-19T12:55:22Z)
SA-Attack: Improving Adversarial Transferability of Vision-Language Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios. We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z)
A Classification-Guided Approach for Adversarial Attacks against Neural Machine Translation [66.58025084857556]
We introduce ACT, a novel adversarial attack framework against NMT systems guided by a classifier. In our attack, the adversary aims to craft meaning-preserving adversarial examples whose translations belong to a different class than the original translations. To evaluate the robustness of NMT models to our attack, we propose enhancements to existing black-box word-replacement-based attacks.
arXiv Detail & Related papers (2023-08-29T12:12:53Z)
A Relaxed Optimization Approach for Adversarial Attacks against Neural Machine Translation Models [44.04452616807661]
We propose an optimization-based adversarial attack against Neural Machine Translation (NMT) models. Experimental results show that our attack significantly degrades the translation quality of multiple NMT models. Our attack outperforms the baselines in terms of success rate, similarity preservation, effect on translation quality, and token error rate.
arXiv Detail & Related papers (2023-06-14T13:13:34Z)
Targeted Adversarial Attacks against Neural Machine Translation [44.04452616807661]
We propose a new targeted adversarial attack against NMT models. Our attack succeeds in inserting a keyword into the translation for more than 75% of sentences.
arXiv Detail & Related papers (2023-03-02T08:43:30Z)
Modeling Coverage for Non-Autoregressive Neural Machine Translation [9.173385214565451]
We propose a novel Coverage-NAT to model the coverage information directly by a token-level coverage iterative refinement mechanism and a sentence-level coverage agreement. Experimental results on WMT14 En-De and WMT16 En-Ro translation tasks show that our method can alleviate those errors and achieve strong improvements over the baseline system.
arXiv Detail & Related papers (2021-04-24T07:33:23Z)
Enriching Non-Autoregressive Transformer with Syntactic and SemanticStructures for Neural Machine Translation [54.864148836486166]
We propose to incorporate the explicit syntactic and semantic structures of languages into a non-autoregressive Transformer. Our model achieves a significantly faster speed, as well as keeps the translation quality when compared with several state-of-the-art non-autoregressive models.
arXiv Detail & Related papers (2021-01-22T04:12:17Z)
Imitation Attacks and Defenses for Black-box Machine Translation Systems [86.92681013449682]
Black-box machine translation (MT) systems have high commercial value and errors can be costly. We show that MT systems can be stolen by querying them with monolingual sentences and training models to imitate their outputs. We propose a defense that modifies translation outputs in order to misdirect the optimization of imitation models.
arXiv Detail & Related papers (2020-04-30T17:56:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.