Conditional Bilingual Mutual Information Based Adaptive Training for
Neural Machine Translation
- URL: http://arxiv.org/abs/2203.02951v1
- Date: Sun, 6 Mar 2022 12:34:10 GMT
- Title: Conditional Bilingual Mutual Information Based Adaptive Training for
Neural Machine Translation
- Authors: Songming Zhang, Yijin Liu, Fandong Meng, Yufeng Chen, Jinan Xu, Jian
Liu and Jie Zhou
- Abstract summary: Token-level adaptive training approaches can alleviate the token imbalance problem.
We propose a target-context-aware metric, named conditional bilingual mutual information (CBMI)
CBMI can be efficiently calculated during model training without any pre-specific statistical calculations.
- Score: 66.23055784400475
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Token-level adaptive training approaches can alleviate the token imbalance
problem and thus improve neural machine translation, through re-weighting the
losses of different target tokens based on specific statistical metrics (e.g.,
token frequency or mutual information). Given that standard translation models
make predictions on the condition of previous target contexts, we argue that
the above statistical metrics ignore target context information and may assign
inappropriate weights to target tokens. While one possible solution is to
directly take target contexts into these statistical metrics, the
target-context-aware statistical computing is extremely expensive, and the
corresponding storage overhead is unrealistic. To solve the above issues, we
propose a target-context-aware metric, named conditional bilingual mutual
information (CBMI), which makes it feasible to supplement target context
information for statistical metrics. Particularly, our CBMI can be formalized
as the log quotient of the translation model probability and language model
probability by decomposing the conditional joint distribution. Thus CBMI can be
efficiently calculated during model training without any pre-specific
statistical calculations and large storage overhead. Furthermore, we propose an
effective adaptive training approach based on both the token- and
sentence-level CBMI. Experimental results on WMT14 English-German and WMT19
Chinese-English tasks show our approach can significantly outperform the
Transformer baseline and other related methods.
Related papers
- Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method [108.56493934296687]
We introduce a divergence-based calibration method, inspired by the divergence-from-randomness concept, to calibrate token probabilities for pretraining data detection.
We have developed a Chinese-language benchmark, PatentMIA, to assess the performance of detection approaches for LLMs on Chinese text.
arXiv Detail & Related papers (2024-09-23T07:55:35Z) - TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in
End-to-End ASR [1.8477401359673709]
Class-probability-based confidence scores do not accurately represent quality of overconfident ASR predictions.
We propose a novel Temporal-Lexeme Similarity (TeLeS) confidence score to train Confidence Estimation Model (CEM)
We conduct experiments with ASR models trained in three languages, namely Hindi, Tamil, and Kannada, with varying training data sizes.
arXiv Detail & Related papers (2024-01-06T16:29:13Z) - Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on
POS Tagging for Non-Standardized Languages [18.210880703295253]
We finetune pretrained language models (PLMs) on seven languages from three different families.
We analyze their zero-shot performance on closely related, non-standardized varieties.
Overall, we find that the similarity between the percentage of words that get split into subwords in the source and target data is the strongest predictor for model performance on target data.
arXiv Detail & Related papers (2023-04-20T08:32:34Z) - Alibaba-Translate China's Submission for WMT 2022 Metrics Shared Task [61.34108034582074]
We build our system based on the core idea of UNITE (Unified Translation Evaluation)
During the model pre-training phase, we first apply the pseudo-labeled data examples to continuously pre-train UNITE.
During the fine-tuning phase, we use both Direct Assessment (DA) and Multidimensional Quality Metrics (MQM) data from past years' WMT competitions.
arXiv Detail & Related papers (2022-10-18T08:51:25Z) - Bridging the Data Gap between Training and Inference for Unsupervised
Neural Machine Translation [49.916963624249355]
A UNMT model is trained on the pseudo parallel data with translated source, and natural source sentences in inference.
The source discrepancy between training and inference hinders the translation performance of UNMT models.
We propose an online self-training approach, which simultaneously uses the pseudo parallel data natural source, translated target to mimic the inference scenario.
arXiv Detail & Related papers (2022-03-16T04:50:27Z) - On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice.
By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data.
We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z) - Bilingual Mutual Information Based Adaptive Training for Neural Machine
Translation [38.83163343372786]
We propose a novel bilingual mutual information (BMI) based adaptive objective, which measures the learning difficulty for each target token from the perspective of bilingualism.
Experimental results on WMT14 English-to-German and WMT19 Chinese-to-English demonstrate the superiority of our approach compared with the Transformer baseline and previous token-level adaptive training approaches.
arXiv Detail & Related papers (2021-05-26T12:54:24Z) - Cross-Lingual Named Entity Recognition Using Parallel Corpus: A New
Approach Using XLM-RoBERTa Alignment [5.747195707763152]
We build an entity alignment model on top of XLM-RoBERTa to project the entities detected on the English part of the parallel data to the target language sentences.
Unlike using translation methods, this approach benefits from natural fluency and nuances in target-language original corpus.
We evaluate this proposed approach over 4 target languages on benchmark data sets and got competitive F1 scores compared to most recent SOTA models.
arXiv Detail & Related papers (2021-01-26T22:19:52Z) - Unsupervised neural adaptation model based on optimal transport for
spoken language identification [54.96267179988487]
Due to the mismatch of statistical distributions of acoustic speech between training and testing sets, the performance of spoken language identification (SLID) could be drastically degraded.
We propose an unsupervised neural adaptation model to deal with the distribution mismatch problem for SLID.
arXiv Detail & Related papers (2020-12-24T07:37:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.