Bilingual Mutual Information Based Adaptive Training for Neural Machine
Translation
- URL: http://arxiv.org/abs/2105.12523v2
- Date: Thu, 27 May 2021 03:26:02 GMT
- Title: Bilingual Mutual Information Based Adaptive Training for Neural Machine
Translation
- Authors: Yangyifan Xu, Yijin Liu, Fandong Meng, Jiajun Zhang, Jinan Xu, Jie
Zhou
- Abstract summary: We propose a novel bilingual mutual information (BMI) based adaptive objective, which measures the learning difficulty for each target token from the perspective of bilingualism.
Experimental results on WMT14 English-to-German and WMT19 Chinese-to-English demonstrate the superiority of our approach compared with the Transformer baseline and previous token-level adaptive training approaches.
- Score: 38.83163343372786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, token-level adaptive training has achieved promising improvement in
machine translation, where the cross-entropy loss function is adjusted by
assigning different training weights to different tokens, in order to alleviate
the token imbalance problem. However, previous approaches only use static word
frequency information in the target language without considering the source
language, which is insufficient for bilingual tasks like machine translation.
In this paper, we propose a novel bilingual mutual information (BMI) based
adaptive objective, which measures the learning difficulty for each target
token from the perspective of bilingualism, and assigns an adaptive weight
accordingly to improve token-level adaptive training. This method assigns
larger training weights to tokens with higher BMI, so that easy tokens are
updated with coarse granularity while difficult tokens are updated with fine
granularity. Experimental results on WMT14 English-to-German and WMT19
Chinese-to-English demonstrate the superiority of our approach compared with
the Transformer baseline and previous token-level adaptive training approaches.
Further analyses confirm that our method can improve the lexical diversity.
Related papers
- Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin [3.2039731457723604]
We aim to improve upon both text classification and translation of Nigerian Pidgin (Naija) by collecting a large-scale parallel English-Pidgin corpus.
Our studies show that English pre-trained language models serve as a stronger prior than multilingual language models on English-Pidgin tasks with up to 2.38 BLEU improvements.
arXiv Detail & Related papers (2023-07-01T16:47:36Z) - Conditional Bilingual Mutual Information Based Adaptive Training for
Neural Machine Translation [66.23055784400475]
Token-level adaptive training approaches can alleviate the token imbalance problem.
We propose a target-context-aware metric, named conditional bilingual mutual information (CBMI)
CBMI can be efficiently calculated during model training without any pre-specific statistical calculations.
arXiv Detail & Related papers (2022-03-06T12:34:10Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Modelling Latent Translations for Cross-Lingual Transfer [47.61502999819699]
We propose a new technique that integrates both steps of the traditional pipeline (translation and classification) into a single model.
We evaluate our novel latent translation-based model on a series of multilingual NLU tasks.
We report gains for both zero-shot and few-shot learning setups, up to 2.7 accuracy points on average.
arXiv Detail & Related papers (2021-07-23T17:11:27Z) - Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural
Machine Translation [53.22775597051498]
We present a continual pre-training framework on mBART to effectively adapt it to unseen languages.
Results show that our method can consistently improve the fine-tuning performance upon the mBART baseline.
Our approach also boosts the performance on translation pairs where both languages are seen in the original mBART's pre-training.
arXiv Detail & Related papers (2021-05-09T14:49:07Z) - Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond [58.80417796087894]
Cross-lingual adaptation with multilingual pre-trained language models (mPTLMs) mainly consists of two lines of works: zero-shot approach and translation-based approach.
We propose a novel framework to consolidate the zero-shot approach and the translation-based approach for better adaptation performance.
arXiv Detail & Related papers (2020-10-23T13:47:01Z) - Token-level Adaptive Training for Neural Machine Translation [84.69646428587548]
There exists a token imbalance phenomenon in natural language as different tokens appear with different frequencies.
vanilla NMT model usually adopts trivial equal-weighted objectives for target tokens with different frequencies.
Low-frequency tokens may carry critical semantic information that will affect the translation quality once they are neglected.
arXiv Detail & Related papers (2020-10-09T05:55:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.