Improving Language Model Integration for Neural Machine Translation
- URL: http://arxiv.org/abs/2306.05077v1
- Date: Thu, 8 Jun 2023 10:00:19 GMT
- Title: Improving Language Model Integration for Neural Machine Translation
- Authors: Christian Herold and Yingbo Gao and Mohammad Zeineldeen and Hermann
Ney
- Abstract summary: We show that accounting for the implicit language model significantly boosts the performance of language model fusion.
We find that accounting for the implicit language model significantly boosts the performance of language model fusion.
- Score: 43.85486035238116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The integration of language models for neural machine translation has been
extensively studied in the past. It has been shown that an external language
model, trained on additional target-side monolingual data, can help improve
translation quality. However, there has always been the assumption that the
translation model also learns an implicit target-side language model during
training, which interferes with the external language model at decoding time.
Recently, some works on automatic speech recognition have demonstrated that, if
the implicit language model is neutralized in decoding, further improvements
can be gained when integrating an external language model. In this work, we
transfer this concept to the task of machine translation and compare with the
most prominent way of including additional monolingual data - namely
back-translation. We find that accounting for the implicit language model
significantly boosts the performance of language model fusion, although this
approach is still outperformed by back-translation.
Related papers
- Self-Translate-Train: Enhancing Cross-Lingual Transfer of Large Language Models via Inherent Capability [31.025371443719404]
Self-Translate-Train is a method that lets large language models translate training data into the target language and fine-tunes the model on its own generated data.
By demonstrating that Self-Translate-Train outperforms zero-shot transfer, we encourage further exploration of better methods to elicit cross-lingual capabilities of LLMs.
arXiv Detail & Related papers (2024-06-29T14:40:23Z) - Do Multilingual Language Models Think Better in English? [24.713751471567395]
Translate-test is a popular technique to improve the performance of multilingual language models.
In this work, we introduce a new approach called self-translate, which overcomes the need of an external translation system.
arXiv Detail & Related papers (2023-08-02T15:29:22Z) - Adapting Multilingual Speech Representation Model for a New,
Underresourced Language through Multilingual Fine-tuning and Continued
Pretraining [2.3513645401551333]
We investigate the possibility for adapting an existing multilingual wav2vec 2.0 model for a new language.
Our results show that continued pretraining is the most effective method to adapt a wav2vec 2.0 model for a new language.
We find that if a model pretrained on a related speech variety or an unrelated language with similar phonological characteristics is available, multilingual fine-tuning using additional data from that language can have positive impact on speech recognition performance.
arXiv Detail & Related papers (2023-01-18T03:57:53Z) - Accidental Learners: Spoken Language Identification in Multilingual
Self-Supervised Models [11.439430077017635]
We find that pre-trained speech models optimally encode language discriminatory information in lower layers.
We demonstrate that the embeddings obtained from these layers are significantly robust to classify unseen languages.
We open-source the model through the NVIDIA NeMo toolkit.
arXiv Detail & Related papers (2022-11-09T18:53:59Z) - MALM: Mixing Augmented Language Modeling for Zero-Shot Machine
Translation [0.0]
Large pre-trained language models have brought remarkable progress in NLP.
We empirically demonstrate the effectiveness of self-supervised pre-training and data augmentation for zero-shot multi-lingual machine translation.
arXiv Detail & Related papers (2022-10-01T17:01:30Z) - Summarize and Generate to Back-translate: Unsupervised Translation of
Programming Languages [86.08359401867577]
Back-translation is widely known for its effectiveness for neural machine translation when little to no parallel data is available.
We propose performing back-translation via code summarization and generation.
We show that our proposed approach performs competitively with state-of-the-art methods.
arXiv Detail & Related papers (2022-05-23T08:20:41Z) - Lifting the Curse of Multilinguality by Pre-training Modular
Transformers [72.46919537293068]
multilingual pre-trained models suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages.
We introduce language-specific modules, which allows us to grow the total capacity of the model, while keeping the total number of trainable parameters per language constant.
Our approach enables adding languages post-hoc with no measurable drop in performance, no longer limiting the model usage to the set of pre-trained languages.
arXiv Detail & Related papers (2022-05-12T17:59:56Z) - Improving Cross-Lingual Reading Comprehension with Self-Training [62.73937175625953]
Current state-of-the-art models even surpass human performance on several benchmarks.
Previous works have revealed the abilities of pre-trained multilingual models for zero-shot cross-lingual reading comprehension.
This paper further utilized unlabeled data to improve the performance.
arXiv Detail & Related papers (2021-05-08T08:04:30Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.