Token Drop mechanism for Neural Machine Translation
- URL: http://arxiv.org/abs/2010.11018v1
- Date: Wed, 21 Oct 2020 14:02:27 GMT
- Title: Token Drop mechanism for Neural Machine Translation
- Authors: Huaao Zhang, Shigui Qiu, Xiangyu Duan, Min Zhang
- Abstract summary: We propose Token Drop to improve generalization and avoid overfitting for the NMT model.
Similar to word dropout, whereas we replace dropped token with a special token instead of setting zero to words.
- Score: 12.666468105300002
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural machine translation with millions of parameters is vulnerable to
unfamiliar inputs. We propose Token Drop to improve generalization and avoid
overfitting for the NMT model. Similar to word dropout, whereas we replace
dropped token with a special token instead of setting zero to words. We further
introduce two self-supervised objectives: Replaced Token Detection and Dropped
Token Prediction. Our method aims to force model generating target translation
with less information, in this way the model can learn textual representation
better. Experiments on Chinese-English and English-Romanian benchmark
demonstrate the effectiveness of our approach and our model achieves
significant improvements over a strong Transformer baseline.
Related papers
- A Classification-Guided Approach for Adversarial Attacks against Neural
Machine Translation [66.58025084857556]
We introduce ACT, a novel adversarial attack framework against NMT systems guided by a classifier.
In our attack, the adversary aims to craft meaning-preserving adversarial examples whose translations belong to a different class than the original translations.
To evaluate the robustness of NMT models to our attack, we propose enhancements to existing black-box word-replacement-based attacks.
arXiv Detail & Related papers (2023-08-29T12:12:53Z) - Towards Opening the Black Box of Neural Machine Translation: Source and
Target Interpretations of the Transformer [1.8594711725515678]
In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and the target prefix.
Previous work on interpretability in NMT has focused solely on source sentence tokens attributions.
We propose an interpretability method that tracks complete input token attributions.
arXiv Detail & Related papers (2022-05-23T20:59:14Z) - Improvement in Machine Translation with Generative Adversarial Networks [0.9612136532344103]
We take inspiration from RelGAN, a model for text generation, and NMT-GAN, an adversarial machine translation model, to implement a model that learns to transform awkward, non-fluent English sentences to fluent ones.
We utilize a parameter $lambda$ to control the amount of deviation from the input sentence, i.e. a trade-off between keeping the original tokens and modifying it to be more fluent.
arXiv Detail & Related papers (2021-11-30T06:51:13Z) - Improving Neural Machine Translation by Bidirectional Training [85.64797317290349]
We present a simple and effective pretraining strategy -- bidirectional training (BiT) for neural machine translation.
Specifically, we bidirectionally update the model parameters at the early stage and then tune the model normally.
Experimental results show that BiT pushes the SOTA neural machine translation performance across 15 translation tasks on 8 language pairs significantly higher.
arXiv Detail & Related papers (2021-09-16T07:58:33Z) - You should evaluate your language model on marginal likelihood
overtokenisations [5.824498637088864]
We argue that language models should be evaluated on their marginal likelihood over tokenisations.
We evaluate pretrained English and German language models on both the one-best-tokenisation and marginal perplexities.
arXiv Detail & Related papers (2021-09-06T15:37:02Z) - Confidence-Aware Scheduled Sampling for Neural Machine Translation [25.406119773503786]
We propose confidence-aware scheduled sampling for neural machine translation.
We quantify real-time model competence by the confidence of model predictions.
Our approach significantly outperforms the Transformer and vanilla scheduled sampling on both translation quality and convergence speed.
arXiv Detail & Related papers (2021-07-22T02:49:04Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z) - Token-level Adaptive Training for Neural Machine Translation [84.69646428587548]
There exists a token imbalance phenomenon in natural language as different tokens appear with different frequencies.
vanilla NMT model usually adopts trivial equal-weighted objectives for target tokens with different frequencies.
Low-frequency tokens may carry critical semantic information that will affect the translation quality once they are neglected.
arXiv Detail & Related papers (2020-10-09T05:55:05Z) - Learning Source Phrase Representations for Neural Machine Translation [65.94387047871648]
We propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations.
In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline.
arXiv Detail & Related papers (2020-06-25T13:43:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.