Language Tags Matter for Zero-Shot Neural Machine Translation
- URL: http://arxiv.org/abs/2106.07930v1
- Date: Tue, 15 Jun 2021 07:32:36 GMT
- Title: Language Tags Matter for Zero-Shot Neural Machine Translation
- Authors: Liwei Wu, Shanbo Cheng, Mingxuan Wang, Lei Li
- Abstract summary: Language tag (LT) strategies are often adopted to indicate the translation directions in MNMT.
We demonstrate that the LTs are not only indicators for translation directions but also crucial to zero-shot translation qualities.
Experimental results show that by ignoring the source language tag (SLT) and adding the target language tag (TLT) to the encoder, the zero-shot translations could achieve a +8 BLEU score difference.
- Score: 17.353423698436547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual Neural Machine Translation (MNMT) has aroused widespread
interest due to its efficiency. An exciting advantage of MNMT models is that
they could also translate between unsupervised (zero-shot) language directions.
Language tag (LT) strategies are often adopted to indicate the translation
directions in MNMT. In this paper, we demonstrate that the LTs are not only
indicators for translation directions but also crucial to zero-shot translation
qualities. Unfortunately, previous work tends to ignore the importance of LT
strategies. We demonstrate that a proper LT strategy could enhance the
consistency of semantic representations and alleviate the off-target issue in
zero-shot directions. Experimental results show that by ignoring the source
language tag (SLT) and adding the target language tag (TLT) to the encoder, the
zero-shot translations could achieve a +8 BLEU score difference over other LT
strategies in IWSLT17, Europarl, TED talks translation tasks.
Related papers
- LCS: A Language Converter Strategy for Zero-Shot Neural Machine Translation [84.38105530043741]
We propose a simple yet effective strategy named Language Converter Strategy (LCS)
By introducing the target language embedding into the top encoder layers, LCS mitigates confusion in the encoder and ensures stable language indication for the decoder.
Experimental results on MultiUN, TED, and OPUS-100 datasets demonstrate that LCS could significantly mitigate the off-target issue.
arXiv Detail & Related papers (2024-06-05T02:52:17Z) - Fine-Tuning Large Language Models to Translate: Will a Touch of Noisy Data in Misaligned Languages Suffice? [33.376648335299116]
Large language models (LLMs) display strong translation capability after being fine-tuned on as few as 32 parallel sentences.
LLMs with only English on the target side can lead to task misinterpretation, which hinders translation into non-English languages.
synthesized data in an under-represented language has a less pronounced effect.
arXiv Detail & Related papers (2024-04-22T12:21:12Z) - Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning [57.323716555996114]
Off-target translation remains an unsolved problem, especially for low-resource languages.
Recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs.
In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
arXiv Detail & Related papers (2024-03-21T13:47:40Z) - Towards Effective Disambiguation for Machine Translation with Large
Language Models [65.80775710657672]
We study the capabilities of large language models to translate "ambiguous sentences"
Experiments show that our methods can match or outperform state-of-the-art systems such as DeepL and NLLB in four out of five language directions.
arXiv Detail & Related papers (2023-09-20T22:22:52Z) - Translate to Disambiguate: Zero-shot Multilingual Word Sense
Disambiguation with Pretrained Language Models [67.19567060894563]
Pretrained Language Models (PLMs) learn rich cross-lingual knowledge and can be finetuned to perform well on diverse tasks.
We present a new study investigating how well PLMs capture cross-lingual word sense with Contextual Word-Level Translation (C-WLT)
We find that as the model size increases, PLMs encode more cross-lingual word sense knowledge and better use context to improve WLT performance.
arXiv Detail & Related papers (2023-04-26T19:55:52Z) - Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis [103.89753784762445]
Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT)
This paper systematically investigates the advantages and challenges of LLMs for MMT.
We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4.
arXiv Detail & Related papers (2023-04-10T15:51:30Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.