Towards Boosting Many-to-Many Multilingual Machine Translation with
Large Language Models
- URL: http://arxiv.org/abs/2401.05861v2
- Date: Wed, 7 Feb 2024 08:37:15 GMT
- Title: Towards Boosting Many-to-Many Multilingual Machine Translation with
Large Language Models
- Authors: Pengzhi Gao, Zhongjun He, Hua Wu, Haifeng Wang
- Abstract summary: This paper focuses on boosting many-to-many multilingual translation of large language models (LLMs) with an emphasis on zero-shot translation directions.
We introduce a cross-lingual consistency regularization, XConST, to bridge the representation gap among different languages.
Experimental results on ALMA, Tower, and LLaMA-2 show that our approach consistently improves translation performance.
- Score: 47.39529535727593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The training paradigm for machine translation has gradually shifted, from
learning neural machine translation (NMT) models with extensive parallel
corpora to instruction finetuning on multilingual large language models (LLMs)
with high-quality translation pairs. In this paper, we focus on boosting
many-to-many multilingual translation of LLMs with an emphasis on zero-shot
translation directions. We demonstrate that prompt strategies adopted during
finetuning are crucial to zero-shot translation and introduce a cross-lingual
consistency regularization, XConST, to bridge the representation gap among
different languages and improve zero-shot translation performance. XConST is
not a new method, but a version of CrossConST (Gao et al., 2023a) adapted for
translation instruction finetuning with LLMs. Experimental results on ALMA (Xu
et al., 2023), Tower (Team, 2024), and LLaMA-2 (Touvron et al., 2023) show that
our approach consistently improves translation performance. Our implementations
are available at https://github.com/gpengzhi/CrossConST-LLM.
Related papers
- How Multilingual Are Large Language Models Fine-Tuned for Translation? [13.612090779277281]
Fine-tuning large language models (LLM) on parallel text has been shown to outperform dedicated translation systems trained in a supervised fashion on much larger amounts of parallel data.
How does translation fine-tuning impact the MT capabilities of LLMs for zero-shot languages, zero-shot language pairs, and translation tasks that do not involve English?
We find that translation fine-tuning improves translation quality even for zero-shot languages on average, but that the impact is uneven depending on the language pairs involved.
arXiv Detail & Related papers (2024-05-30T22:08:20Z) - A Paradigm Shift in Machine Translation: Boosting Translation
Performance of Large Language Models [27.777372498182864]
We propose a novel fine-tuning approach for Generative Large Language Models (LLMs)
Our approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data.
Based on LLaMA-2 as our underlying model, our results show that the model can achieve an average improvement of more than 12 BLEU and 12 COMET over its zero-shot performance.
arXiv Detail & Related papers (2023-09-20T22:53:15Z) - TIM: Teaching Large Language Models to Translate with Comparison [78.66926087162672]
We propose a novel framework using examples in comparison to teach LLMs to learn translation.
Our approach involves presenting the model with examples of correct and incorrect translations and using a preference loss to guide the model's learning.
Our findings offer a new perspective on fine-tuning LLMs for translation tasks and provide a promising solution for generating high-quality translations.
arXiv Detail & Related papers (2023-07-10T08:15:40Z) - Learning Multilingual Sentence Representations with Cross-lingual
Consistency Regularization [46.09132547431629]
We introduce MuSR: a one-for-all Multilingual Sentence Representation model that supports more than 220 languages.
We train a multilingual Transformer encoder, coupled with an auxiliary Transformer decoder, by adopting a multilingual NMT framework.
Experimental results on multilingual similarity search and bitext mining tasks show the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-12T07:39:06Z) - Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions [68.01449013641532]
Large-scale Pretrained Language Models (LLMs) have shown strong abilities in multilingual translations.
We present a detailed analysis by finetuning a multilingual pretrained language model, XGLM-7B, to perform multilingual translation.
arXiv Detail & Related papers (2023-05-24T12:00:24Z) - Improving Zero-shot Multilingual Neural Machine Translation by
Leveraging Cross-lingual Consistency Regularization [46.09132547431629]
The multilingual neural machine translation (NMT) model has a promising capability of zero-shot translation.
This paper introduces a cross-lingual consistency regularization, CrossConST, to bridge the representation gap among different languages.
arXiv Detail & Related papers (2023-05-12T08:32:18Z) - XLM-T: Scaling up Multilingual Machine Translation with Pretrained
Cross-lingual Transformer Encoders [89.0059978016914]
We present XLM-T, which initializes the model with an off-the-shelf pretrained cross-lingual Transformer and fine-tunes it with multilingual parallel data.
This simple method achieves significant improvements on a WMT dataset with 10 language pairs and the OPUS-100 corpus with 94 pairs.
arXiv Detail & Related papers (2020-12-31T11:16:51Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.