Language Model Prior for Low-Resource Neural Machine Translation
- URL: http://arxiv.org/abs/2004.14928v3
- Date: Mon, 26 Oct 2020 08:56:46 GMT
- Title: Language Model Prior for Low-Resource Neural Machine Translation
- Authors: Christos Baziotis, Barry Haddow, Alexandra Birch
- Abstract summary: We propose a novel approach to incorporate a LM as prior in a neural translation model (TM)
We add a regularization term, which pushes the output distributions of the TM to be probable under the LM prior.
Results on two low-resource machine translation datasets show clear improvements even with limited monolingual data.
- Score: 85.55729693003829
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The scarcity of large parallel corpora is an important obstacle for neural
machine translation. A common solution is to exploit the knowledge of language
models (LM) trained on abundant monolingual data. In this work, we propose a
novel approach to incorporate a LM as prior in a neural translation model (TM).
Specifically, we add a regularization term, which pushes the output
distributions of the TM to be probable under the LM prior, while avoiding wrong
predictions when the TM "disagrees" with the LM. This objective relates to
knowledge distillation, where the LM can be viewed as teaching the TM about the
target language. The proposed approach does not compromise decoding speed,
because the LM is used only at training time, unlike previous work that
requires it during inference. We present an analysis of the effects that
different methods have on the distributions of the TM. Results on two
low-resource machine translation datasets show clear improvements even with
limited monolingual data.
Related papers
- TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks.
We propose the TasTe framework, which stands for translating through self-reflection.
The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z) - Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning [57.323716555996114]
Off-target translation remains an unsolved problem, especially for low-resource languages.
Recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs.
In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
arXiv Detail & Related papers (2024-03-21T13:47:40Z) - TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement [26.26493253161022]
Large Language Models (LLMs) have achieved impressive results in Machine Translation (MT)
We introduce a systematic LLM-based self-refinement translation framework, named textbfTEaR.
arXiv Detail & Related papers (2024-02-26T07:58:12Z) - Towards Effective Disambiguation for Machine Translation with Large
Language Models [65.80775710657672]
We study the capabilities of large language models to translate "ambiguous sentences"
Experiments show that our methods can match or outperform state-of-the-art systems such as DeepL and NLLB in four out of five language directions.
arXiv Detail & Related papers (2023-09-20T22:22:52Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Prevent the Language Model from being Overconfident in Neural Machine
Translation [21.203435303812043]
We propose a Margin-based Objective (MTO) and a Margin-based Sentencelevel Objective (MSO) to maximize the Margin for preventing the LM from being overconfident.
Experiments on WMT14 English-to-German, WMT19 Chinese-to-English, and WMT14 English-to-French translation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2021-05-24T05:34:09Z) - Reusing a Pretrained Language Model on Languages with Limited Corpora
for Unsupervised NMT [129.99918589405675]
We present an effective approach that reuses an LM that is pretrained only on the high-resource language.
The monolingual LM is fine-tuned on both languages and is then used to initialize a UNMT model.
Our approach, RE-LM, outperforms a competitive cross-lingual pretraining model (XLM) in English-Macedonian (En-Mk) and English-Albanian (En-Sq)
arXiv Detail & Related papers (2020-09-16T11:37:10Z) - El Departamento de Nosotros: How Machine Translated Corpora Affects
Language Models in MRC Tasks [0.12183405753834563]
Pre-training large-scale language models (LMs) requires huge amounts of text corpora.
We study the caveats of applying directly translated corpora for fine-tuning LMs for downstream natural language processing tasks.
We show that careful curation along with post-processing lead to improved performance and overall LMs robustness.
arXiv Detail & Related papers (2020-07-03T22:22:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.