Detecting Word Sense Disambiguation Biases in Machine Translation for
Model-Agnostic Adversarial Attacks
- URL: http://arxiv.org/abs/2011.01846v1
- Date: Tue, 3 Nov 2020 17:01:44 GMT
- Title: Detecting Word Sense Disambiguation Biases in Machine Translation for
Model-Agnostic Adversarial Attacks
- Authors: Denis Emelin, Ivan Titov, Rico Sennrich
- Abstract summary: We introduce a method for the prediction of disambiguation errors based on statistical data properties.
We develop a simple adversarial attack strategy that minimally perturbs sentences in order to elicit disambiguation errors.
Our findings indicate that disambiguation robustness varies substantially between domains and that different models trained on the same data are vulnerable to different attacks.
- Score: 84.61578555312288
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Word sense disambiguation is a well-known source of translation errors in
NMT. We posit that some of the incorrect disambiguation choices are due to
models' over-reliance on dataset artifacts found in training data, specifically
superficial word co-occurrences, rather than a deeper understanding of the
source text. We introduce a method for the prediction of disambiguation errors
based on statistical data properties, demonstrating its effectiveness across
several domains and model types. Moreover, we develop a simple adversarial
attack strategy that minimally perturbs sentences in order to elicit
disambiguation errors to further probe the robustness of translation models.
Our findings indicate that disambiguation robustness varies substantially
between domains and that different models trained on the same data are
vulnerable to different attacks.
Related papers
- Enhancing adversarial robustness in Natural Language Inference using explanations [41.46494686136601]
We cast the spotlight on the underexplored task of Natural Language Inference (NLI)
We validate the usage of natural language explanation as a model-agnostic defence strategy through extensive experimentation.
We research the correlation of widely used language generation metrics with human perception, in order for them to serve as a proxy towards robust NLI models.
arXiv Detail & Related papers (2024-09-11T17:09:49Z) - Machine Translation Models Stand Strong in the Face of Adversarial
Attacks [2.6862667248315386]
Our research focuses on the impact of adversarial attacks on sequence-to-sequence (seq2seq) models, specifically machine translation models.
We introduce algorithms that incorporate basic text perturbations and more advanced strategies, such as the gradient-based attack.
arXiv Detail & Related papers (2023-09-10T11:22:59Z) - Context-Aware Semantic Similarity Measurement for Unsupervised Word
Sense Disambiguation [0.0]
This research proposes a new context-aware approach to unsupervised word sense disambiguation.
It provides a flexible mechanism for incorporating contextual information into the similarity measurement process.
Our findings underscore the significance of integrating contextual information in semantic similarity measurements.
arXiv Detail & Related papers (2023-05-05T13:50:04Z) - Towards Fine-Grained Information: Identifying the Type and Location of
Translation Errors [80.22825549235556]
Existing approaches can not synchronously consider error position and type.
We build an FG-TED model to predict the textbf addition and textbfomission errors.
Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results.
arXiv Detail & Related papers (2023-02-17T16:20:33Z) - In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks.
Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks.
We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z) - Adversarial Training for Improving Model Robustness? Look at Both
Prediction and Interpretation [21.594361495948316]
We propose a novel feature-level adversarial training method named FLAT.
FLAT incorporates variational word masks in neural networks to learn global word importance.
Experiments show the effectiveness of FLAT in improving the robustness with respect to both predictions and interpretations.
arXiv Detail & Related papers (2022-03-23T20:04:14Z) - On the Transferability of Adversarial Attacksagainst Neural Text
Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models.
We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models.
We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial
Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification.
This paper studies a complementary failure mode, invariance-based adversarial examples.
We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.