On the Complementarity between Pre-Training and Random-Initialization
for Resource-Rich Machine Translation
- URL: http://arxiv.org/abs/2209.03316v1
- Date: Wed, 7 Sep 2022 17:23:08 GMT
- Title: On the Complementarity between Pre-Training and Random-Initialization
for Resource-Rich Machine Translation
- Authors: Changtong Zan, Liang Ding, Li Shen, Yu Cao, Weifeng Liu, Dacheng Tao
- Abstract summary: Pre-Training (PT) of text representations has been successfully applied to low-resource Neural Machine Translation (NMT)
We propose to combine their complementarities with a model fusion algorithm that utilizes optimal transport to align neurons between PT and RI.
Experiments on two resource-rich translation benchmarks, WMT'17 English-Chinese (20M) and WMT'19 English-German (36M), show that PT and RI could be nicely complementary to each other.
- Score: 80.16548523140025
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Pre-Training (PT) of text representations has been successfully applied to
low-resource Neural Machine Translation (NMT). However, it usually fails to
achieve notable gains (some- times, even worse) on resource-rich NMT on par
with its Random-Initialization (RI) counterpart. We take the first step to
investigate the complementarity between PT and RI in resource-rich scenarios
via two probing analyses, and find that: 1) PT improves NOT the accuracy, but
the generalization by achieving flatter loss landscapes than that of RI; 2) PT
improves NOT the confidence of lexical choice, but the negative diversity by
assigning smoother lexical probability distributions than that of RI. Based on
these insights, we propose to combine their complementarities with a model
fusion algorithm that utilizes optimal transport to align neurons between PT
and RI. Experiments on two resource-rich translation benchmarks, WMT'17
English-Chinese (20M) and WMT'19 English-German (36M), show that PT and RI
could be nicely complementary to each other, achieving substantial improvements
considering both translation accuracy, generalization, and negative diversity.
Probing tools and code are released at: https://github.com/zanchangtong/PTvsRI.
Related papers
- Mismatching-Aware Unsupervised Translation Quality Estimation For
Low-Resource Languages [6.049660810617423]
XLMRScore is a cross-lingual counterpart of BERTScore computed via the XLM-RoBERTa (XLMR) model.
We evaluate the proposed method on four low-resource language pairs of the WMT21 QE shared task.
arXiv Detail & Related papers (2022-07-31T16:23:23Z) - BitextEdit: Automatic Bitext Editing for Improved Low-Resource Machine
Translation [53.55009917938002]
We propose to refine the mined bitexts via automatic editing.
Experiments demonstrate that our approach successfully improves the quality of CCMatrix mined bitext for 5 low-resource language-pairs and 10 translation directions by up to 8 BLEU points.
arXiv Detail & Related papers (2021-11-12T16:00:39Z) - On the Complementarity between Pre-Training and Back-Translation for
Neural Machine Translation [63.914940899327966]
Pre-training (PT) and back-translation (BT) are two simple and powerful methods to utilize monolingual data.
This paper takes the first step to investigate the complementarity between PT and BT.
We establish state-of-the-art performances on the WMT16 English-Romanian and English-Russian benchmarks.
arXiv Detail & Related papers (2021-10-05T04:01:36Z) - Modelling Latent Translations for Cross-Lingual Transfer [47.61502999819699]
We propose a new technique that integrates both steps of the traditional pipeline (translation and classification) into a single model.
We evaluate our novel latent translation-based model on a series of multilingual NLU tasks.
We report gains for both zero-shot and few-shot learning setups, up to 2.7 accuracy points on average.
arXiv Detail & Related papers (2021-07-23T17:11:27Z) - Exploiting Neural Query Translation into Cross Lingual Information
Retrieval [49.167049709403166]
Existing CLIR systems mainly exploit statistical-based machine translation (SMT) rather than the advanced neural machine translation (NMT)
We propose a novel data augmentation method that extracts query translation pairs according to user clickthrough data.
Experimental results reveal that the proposed approach yields better retrieval quality than strong baselines.
arXiv Detail & Related papers (2020-10-26T15:28:19Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z) - Language Model Prior for Low-Resource Neural Machine Translation [85.55729693003829]
We propose a novel approach to incorporate a LM as prior in a neural translation model (TM)
We add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior.
Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.
arXiv Detail & Related papers (2020-04-30T16:29:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.