Related papers: Detecting Word Sense Disambiguation Biases in Machine Translation for Model-Agnostic Adversarial Attacks

Detecting Word Sense Disambiguation Biases in Machine Translation for Model-Agnostic Adversarial Attacks

URL: http://arxiv.org/abs/2011.01846v1
Date: Tue, 3 Nov 2020 17:01:44 GMT
Title: Detecting Word Sense Disambiguation Biases in Machine Translation for Model-Agnostic Adversarial Attacks
Authors: Denis Emelin, Ivan Titov, Rico Sennrich
Abstract summary: We introduce a method for the prediction of disambiguation errors based on statistical data properties. We develop a simple adversarial attack strategy that minimally perturbs sentences in order to elicit disambiguation errors. Our findings indicate that disambiguation robustness varies substantially between domains and that different models trained on the same data are vulnerable to different attacks.
Score: 84.61578555312288
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Word sense disambiguation is a well-known source of translation errors in NMT. We posit that some of the incorrect disambiguation choices are due to models' over-reliance on dataset artifacts found in training data, specifically superficial word co-occurrences, rather than a deeper understanding of the source text. We introduce a method for the prediction of disambiguation errors based on statistical data properties, demonstrating its effectiveness across several domains and model types. Moreover, we develop a simple adversarial attack strategy that minimally perturbs sentences in order to elicit disambiguation errors to further probe the robustness of translation models. Our findings indicate that disambiguation robustness varies substantially between domains and that different models trained on the same data are vulnerable to different attacks.

Related papers

On the Predictive Power of Representation Dispersion in Language Models [27.85164935773766]
We show that models that spread contextual representations more widely tend to achieve lower perplexity.<n>We show how dispersion can be leveraged for a range of practical tasks without requiring labeled data.
arXiv Detail & Related papers (2025-06-30T17:53:50Z)
Enhancing adversarial robustness in Natural Language Inference using explanations [41.46494686136601]
We cast the spotlight on the underexplored task of Natural Language Inference (NLI) We validate the usage of natural language explanation as a model-agnostic defence strategy through extensive experimentation. We research the correlation of widely used language generation metrics with human perception, in order for them to serve as a proxy towards robust NLI models.
arXiv Detail & Related papers (2024-09-11T17:09:49Z)
Machine Translation Models Stand Strong in the Face of Adversarial Attacks [2.6862667248315386]
Our research focuses on the impact of adversarial attacks on sequence-to-sequence (seq2seq) models, specifically machine translation models. We introduce algorithms that incorporate basic text perturbations and more advanced strategies, such as the gradient-based attack.
arXiv Detail & Related papers (2023-09-10T11:22:59Z)
Context-Aware Semantic Similarity Measurement for Unsupervised Word Sense Disambiguation [0.0]
This research proposes a new context-aware approach to unsupervised word sense disambiguation. It provides a flexible mechanism for incorporating contextual information into the similarity measurement process. Our findings underscore the significance of integrating contextual information in semantic similarity measurements.
arXiv Detail & Related papers (2023-05-05T13:50:04Z)
Towards Fine-Grained Information: Identifying the Type and Location of Translation Errors [80.22825549235556]
Existing approaches can not synchronously consider error position and type. We build an FG-TED model to predict the textbf addition and textbfomission errors. Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results.
arXiv Detail & Related papers (2023-02-17T16:20:33Z)
In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
We study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks. Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks. We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
arXiv Detail & Related papers (2022-12-20T14:06:50Z)
Adversarial Training for Improving Model Robustness? Look at Both Prediction and Interpretation [21.594361495948316]
We propose a novel feature-level adversarial training method named FLAT. FLAT incorporates variational word masks in neural networks to learn global word importance. Experiments show the effectiveness of FLAT in improving the robustness with respect to both predictions and interpretations.
arXiv Detail & Related papers (2022-03-23T20:04:14Z)
On the Transferability of Adversarial Attacksagainst Neural Text Classifier [121.6758865857686]
We investigate the transferability of adversarial examples for text classification models. We propose a genetic algorithm to find an ensemble of models that can induce adversarial examples to fool almost all existing models. We derive word replacement rules that can be used for model diagnostics from these adversarial examples.
arXiv Detail & Related papers (2020-11-17T10:45:05Z)
Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets. Part of the challenge of learning robust models lies in the influence of unobserved confounders. We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z)
Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification. This paper studies a complementary failure mode, invariance-based adversarial examples. We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.