Related papers: A Relaxed Optimization Approach for Adversarial Attacks against Neural Machine Translation Models

A Relaxed Optimization Approach for Adversarial Attacks against Neural Machine Translation Models

URL: http://arxiv.org/abs/2306.08492v1
Date: Wed, 14 Jun 2023 13:13:34 GMT
Title: A Relaxed Optimization Approach for Adversarial Attacks against Neural Machine Translation Models
Authors: Sahar Sadrizadeh, Cl\'ement Barbier, Ljiljana Dolamic, Pascal Frossard
Abstract summary: We propose an optimization-based adversarial attack against Neural Machine Translation (NMT) models. Experimental results show that our attack significantly degrades the translation quality of multiple NMT models. Our attack outperforms the baselines in terms of success rate, similarity preservation, effect on translation quality, and token error rate.
Score: 44.04452616807661
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose an optimization-based adversarial attack against Neural Machine Translation (NMT) models. First, we propose an optimization problem to generate adversarial examples that are semantically similar to the original sentences but destroy the translation generated by the target NMT model. This optimization problem is discrete, and we propose a continuous relaxation to solve it. With this relaxation, we find a probability distribution for each token in the adversarial example, and then we can generate multiple adversarial examples by sampling from these distributions. Experimental results show that our attack significantly degrades the translation quality of multiple NMT models while maintaining the semantic similarity between the original and adversarial sentences. Furthermore, our attack outperforms the baselines in terms of success rate, similarity preservation, effect on translation quality, and token error rate. Finally, we propose a black-box extension of our attack by sampling from an optimized probability distribution for a reference model whose gradients are accessible.

Related papers

A Classification-Guided Approach for Adversarial Attacks against Neural Machine Translation [66.58025084857556]
We introduce ACT, a novel adversarial attack framework against NMT systems guided by a classifier. In our attack, the adversary aims to craft meaning-preserving adversarial examples whose translations belong to a different class than the original translations. To evaluate the robustness of NMT models to our attack, we propose enhancements to existing black-box word-replacement-based attacks.
arXiv Detail & Related papers (2023-08-29T12:12:53Z)
Boosting Adversarial Transferability by Achieving Flat Local Maxima [23.91315978193527]
Recently, various adversarial attacks have emerged to boost adversarial transferability from different perspectives. In this work, we assume and empirically validate that adversarial examples at a flat local region tend to have good transferability. We propose an approximation optimization method to simplify the gradient update of the objective function.
arXiv Detail & Related papers (2023-06-08T14:21:02Z)
Making Substitute Models More Bayesian Can Enhance Transferability of Adversarial Examples [89.85593878754571]
transferability of adversarial examples across deep neural networks is the crux of many black-box attacks. We advocate to attack a Bayesian model for achieving desirable transferability. Our method outperforms recent state-of-the-arts by large margins.
arXiv Detail & Related papers (2023-02-10T07:08:13Z)
TransFool: An Adversarial Attack against Neural Machine Translation Models [49.50163349643615]
We investigate the vulnerability of Neural Machine Translation (NMT) models to adversarial attacks and propose a new attack algorithm called TransFool. We generate fluent adversarial examples in the source language that maintain a high level of semantic similarity with the clean samples. Based on automatic and human evaluations, TransFool leads to improvement in terms of success rate, semantic similarity, and fluency compared to the existing attacks.
arXiv Detail & Related papers (2023-02-02T08:35:34Z)
Strong Transferable Adversarial Attacks via Ensembled Asymptotically Normal Distribution Learning [24.10329164911317]
We propose an approach named Multiple Asymptotically Normal Distribution Attacks (MultiANDA) We approximate the posterior distribution over the perturbations by taking advantage of the normality property of gradient ascent (SGA) Our proposed method outperforms ten state-of-the-art black-box attacks on deep learning models with or without defenses.
arXiv Detail & Related papers (2022-09-24T08:57:10Z)
Generating Authentic Adversarial Examples beyond Meaning-preserving with Doubly Round-trip Translation [64.16077929617119]
We propose a new criterion for NMT adversarial examples based on the Doubly Round-Trip Translation (DRTT) To enhance the robustness of the NMT model, we introduce the masked language models to construct bilingual adversarial pairs.
arXiv Detail & Related papers (2022-04-19T06:15:27Z)
Doubly-Trained Adversarial Data Augmentation for Neural Machine Translation [8.822338727711715]
We generate adversarial augmentation samples that attack the model and preserve the source-side semantic meaning. The results from our experiments show that these adversarial samples improve the model robustness.
arXiv Detail & Related papers (2021-10-12T02:23:00Z)
BOSS: Bidirectional One-Shot Synthesis of Adversarial Examples [8.359029046999233]
A one-shot synthesis of adversarial examples is proposed in this paper. The inputs are synthesized from scratch to induce arbitrary soft predictions at the output of pre-trained models. We demonstrate the generality and versatility of the framework and approach proposed through applications to the design of targeted adversarial attacks.
arXiv Detail & Related papers (2021-08-05T17:43:36Z)
Gradient-based Adversarial Attacks against Text Transformers [96.73493433809419]
We propose the first general-purpose gradient-based attack against transformer models. We empirically demonstrate that our white-box attack attains state-of-the-art attack performance on a variety of natural language tasks.
arXiv Detail & Related papers (2021-04-15T17:43:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.