Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT
Models for Code Generation
- URL: http://arxiv.org/abs/2203.15319v2
- Date: Wed, 30 Mar 2022 12:58:40 GMT
- Title: Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT
Models for Code Generation
- Authors: Pietro Liguori, Cristina Improta, Simona De Vivo, Roberto Natella,
Bojan Cukic and Domenico Cotroneo
- Abstract summary: A key step to validate the robustness of the NMT models is to evaluate their performance on adversarial inputs.
In this work, we identify a set of perturbations and metrics tailored for the robustness assessment of such models.
We present a preliminary experimental evaluation, showing what type of perturbations affect the model the most.
- Score: 1.7616042687330642
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural Machine Translation (NMT) has reached a level of maturity to be
recognized as the premier method for the translation between different
languages and aroused interest in different research areas, including software
engineering. A key step to validate the robustness of the NMT models consists
in evaluating the performance of the models on adversarial inputs, i.e., inputs
obtained from the original ones by adding small amounts of perturbation.
However, when dealing with the specific task of the code generation (i.e., the
generation of code starting from a description in natural language), it has not
yet been defined an approach to validate the robustness of the NMT models. In
this work, we address the problem by identifying a set of perturbations and
metrics tailored for the robustness assessment of such models. We present a
preliminary experimental evaluation, showing what type of perturbations affect
the model the most and deriving useful insights for future directions.
Related papers
- Ask Language Model to Clean Your Noisy Translation Data [7.246698449812031]
We focus on cleaning the noise from the target sentences in MTNT, making it more suitable as a benchmark for noise evaluation.
We show that large language models (LLMs) can effectively rephrase slang, jargon, and profanities.
Experiments on C-MTNT showcased its effectiveness in evaluating the robustness of NMT models.
arXiv Detail & Related papers (2023-10-20T13:05:32Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - Pseudo-Label Training and Model Inertia in Neural Machine Translation [18.006833174265612]
neural machine translation (NMT) models are sensitive to small input changes and can show significant variation across re-training or incremental model updates.
This work studies a frequently used method in NMT, pseudo-label training (PLT), which is common to the related techniques of forwardtranslation or self-training.
While the effect of quality is well-documented, we highlight a lesser-known effect:PL can enhance a model's stability to model updates and input perturbations.
arXiv Detail & Related papers (2023-05-19T16:45:19Z) - Discover, Explanation, Improvement: An Automatic Slice Detection
Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints.
This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks.
Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z) - Towards Robust k-Nearest-Neighbor Machine Translation [72.9252395037097]
k-Nearest-Neighbor Machine Translation (kNN-MT) becomes an important research direction of NMT in recent years.
Its main idea is to retrieve useful key-value pairs from an additional datastore to modify translations without updating the NMT model.
The underlying retrieved noisy pairs will dramatically deteriorate the model performance.
We propose a confidence-enhanced kNN-MT model with robust training to alleviate the impact of noise.
arXiv Detail & Related papers (2022-10-17T07:43:39Z) - SALTED: A Framework for SAlient Long-Tail Translation Error Detection [17.914521288548844]
We introduce SALTED, a specifications-based framework for behavioral testing of machine translation models.
At the core of our approach is the development of high-precision detectors that flag errors between a source sentence and a system output.
We demonstrate that such detectors could be used not just to identify salient long-tail errors in MT systems, but also for higher-recall filtering of the training data.
arXiv Detail & Related papers (2022-05-20T06:45:07Z) - NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task
Models [54.184609286094044]
We propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data.
The proposed approach improved traditional OOD detection evaluation metrics by 55% on average compared to the original fine-tuned models.
arXiv Detail & Related papers (2021-08-29T06:58:28Z) - Evaluating the Robustness of Neural Language Models to Input
Perturbations [7.064032374579076]
In this study, we design and implement various types of character-level and word-level perturbation methods to simulate noisy input texts.
We investigate the ability of high-performance language models such as BERT, XLNet, RoBERTa, and ELMo in handling different types of input perturbations.
The results suggest that language models are sensitive to input perturbations and their performance can decrease even when small changes are introduced.
arXiv Detail & Related papers (2021-08-27T12:31:17Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.