Code-switching pre-training for neural machine translation
- URL: http://arxiv.org/abs/2009.08088v1
- Date: Thu, 17 Sep 2020 06:10:07 GMT
- Title: Code-switching pre-training for neural machine translation
- Authors: Zhen Yang, Bojie Hu, Ambyera Han, Shen Huang and Qi Ju
- Abstract summary: This paper proposes a new pre-training method, called Code-Switching Pre-training (CSP) for Neural Machine Translation (NMT)
Unlike traditional pre-training method which randomly masks some fragments of the input sentence, the proposed CSP randomly replaces some words in the source sentence with their translation words in the target language.
Experimental results show that CSP achieves significant improvements over baselines without pre-training or with other pre-training methods.
- Score: 13.35263905025371
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a new pre-training method, called Code-Switching
Pre-training (CSP for short) for Neural Machine Translation (NMT). Unlike
traditional pre-training method which randomly masks some fragments of the
input sentence, the proposed CSP randomly replaces some words in the source
sentence with their translation words in the target language. Specifically, we
firstly perform lexicon induction with unsupervised word embedding mapping
between the source and target languages, and then randomly replace some words
in the input sentence with their translation words according to the extracted
translation lexicons. CSP adopts the encoder-decoder framework: its encoder
takes the code-mixed sentence as input, and its decoder predicts the replaced
fragment of the input sentence. In this way, CSP is able to pre-train the NMT
model by explicitly making the most of the cross-lingual alignment information
extracted from the source and target monolingual corpus. Additionally, we
relieve the pretrain-finetune discrepancy caused by the artificial symbols like
[mask]. To verify the effectiveness of the proposed method, we conduct
extensive experiments on unsupervised and supervised NMT. Experimental results
show that CSP achieves significant improvements over baselines without
pre-training or with other pre-training methods.
Related papers
- Language Model is a Branch Predictor for Simultaneous Machine
Translation [73.82754138171587]
We propose incorporating branch prediction techniques in SiMT tasks to reduce translation latency.
We utilize a language model as a branch predictor to predict potential branch directions.
When the actual source word deviates from the predicted source word, we use the real source word to decode the output again, replacing the predicted output.
arXiv Detail & Related papers (2023-12-22T07:32:47Z) - CBSiMT: Mitigating Hallucination in Simultaneous Machine Translation
with Weighted Prefix-to-Prefix Training [13.462260072313894]
Simultaneous machine translation (SiMT) is a challenging task that requires starting translation before the full source sentence is available.
Prefix-to- framework is often applied to SiMT, which learns to predict target tokens using only a partial source prefix.
We propose a Confidence-Based Simultaneous Machine Translation framework, which uses model confidence to perceive hallucination tokens.
arXiv Detail & Related papers (2023-11-07T02:44:45Z) - Code-Switching with Word Senses for Pretraining in Neural Machine
Translation [107.23743153715799]
We introduce Word Sense Pretraining for Neural Machine Translation (WSP-NMT)
WSP-NMT is an end-to-end approach for pretraining multilingual NMT models leveraging word sense-specific information from Knowledge Bases.
Our experiments show significant improvements in overall translation quality.
arXiv Detail & Related papers (2023-10-21T16:13:01Z) - Learning Homographic Disambiguation Representation for Neural Machine
Translation [20.242134720005467]
Homographs, words with the same spelling but different meanings, remain challenging in Neural Machine Translation (NMT)
We propose a novel approach to tackle issues of NMT in the latent space.
We first train an encoder (aka " homographic-encoder") to learn universal sentence representations in a natural language inference (NLI) task.
We further fine-tune the encoder using homograph-based syn-set WordNet, enabling it to learn word-set representations from sentences.
arXiv Detail & Related papers (2023-04-12T13:42:59Z) - Towards Opening the Black Box of Neural Machine Translation: Source and
Target Interpretations of the Transformer [1.8594711725515678]
In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and the target prefix.
Previous work on interpretability in NMT has focused solely on source sentence tokens attributions.
We propose an interpretability method that tracks complete input token attributions.
arXiv Detail & Related papers (2022-05-23T20:59:14Z) - Understanding and Improving Sequence-to-Sequence Pretraining for Neural
Machine Translation [48.50842995206353]
We study the impact of the jointly pretrained decoder, which is the main difference between Seq2Seq pretraining and previous encoder-based pretraining approaches for NMT.
We propose simple and effective strategies, named in-domain pretraining and input adaptation to remedy the domain and objective discrepancies.
arXiv Detail & Related papers (2022-03-16T07:36:28Z) - Anticipation-free Training for Simultaneous Translation [70.85761141178597]
Simultaneous translation (SimulMT) speeds up the translation process by starting to translate before the source sentence is completely available.
Existing methods increase latency or introduce adaptive read-write policies for SimulMT models to handle local reordering and improve translation quality.
We propose a new framework that decomposes the translation process into the monotonic translation step and the reordering step.
arXiv Detail & Related papers (2022-01-30T16:29:37Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Explicit Sentence Compression for Neural Machine Translation [110.98786673598016]
State-of-the-art Transformer-based neural machine translation (NMT) systems still follow a standard encoder-decoder framework.
backbone information, which stands for the gist of a sentence, is not specifically focused on.
We propose an explicit sentence compression method to enhance the source sentence representation for NMT.
arXiv Detail & Related papers (2019-12-27T04:14:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.