Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level
- URL: http://arxiv.org/abs/2406.15741v3
- Date: Tue, 29 Oct 2024 05:15:09 GMT
- Title: Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level
- Authors: Zhaopeng Feng, Ruizhe Chen, Yan Zhang, Zijie Meng, Zuozhu Liu,
- Abstract summary: General-purpose Large Language Models (LLMs) have achieved remarkable advancements in machine translation (MT) by leveraging extensive web content.
However, translation-specific LLMs are built by pre-training on domain-specific monolingual corpora and fine-tuning with human-annotated translation data.
We develop MT-Ladder, a novel model-agnostic and cost-effective tool to refine the performance of general LLMs for MT.
- Score: 9.699022347910121
- License:
- Abstract: General-purpose Large Language Models (LLMs) like GPT-4 have achieved remarkable advancements in machine translation (MT) by leveraging extensive web content. On the other hand, translation-specific LLMs are built by pre-training on domain-specific monolingual corpora and fine-tuning with human-annotated translation data. Despite the superior performance, these methods either demand an unprecedented scale of computing and data or substantial human editing and annotation efforts. In this paper, we develop MT-Ladder, a novel model-agnostic and cost-effective tool to refine the performance of general LLMs for MT. MT-Ladder is trained on pseudo-refinement triplets which can be easily obtained from existing LLMs without additional human cost. During training, we propose a hierarchical fine-tuning strategy with an easy-to-hard schema, improving MT-Ladder's refining performance progressively. The trained MT-Ladder can be seamlessly integrated with any general-purpose LLMs to boost their translation performance. By utilizing Gemma-2B/7B as the backbone, MT-Ladder-2B can elevate raw translations to the level of top-tier open-source models (e.g., refining BigTranslate-13B with +6.91 BLEU and +3.52 COMET for XX-En), and MT-Ladder-7B can further enhance model performance to be on par with the state-of-the-art GPT-4. Extensive ablation and analysis corroborate the effectiveness of MT-Ladder in diverse settings. Our code is available at https://github.com/fzp0424/MT-Ladder
Related papers
- Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages [2.53740603524637]
Machine translation models (MT) produce excellent multilingual representations, resulting in strong translation performance even for low-resource languages.
In this work, we get the best both worlds by integrating MT encoders directly into language backbones via sample-efficient self-distillation.
The resulting MT-LLMs preserve the inherent multilingual representational alignment from the MT encoder, allowing lower-resource languages to tap into the rich knowledge embedded in English-centric LLMs.
arXiv Detail & Related papers (2024-06-18T16:00:20Z) - TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks.
We propose the TasTe framework, which stands for translating through self-reflection.
The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z) - The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities [18.175795328685986]
Fine-tuning large language models (LLMs) for machine translation has shown improvements in overall translation quality.
We perform an extensive translation evaluation on the LLaMA and Falcon family of models with model size ranging from 7 billion up to 65 billion parameters.
We observe a decline in the ability to perform formality steering, to produce technical translations through few-shot examples, and to perform document-level translation.
arXiv Detail & Related papers (2024-05-30T14:25:56Z) - Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning [57.323716555996114]
Off-target translation remains an unsolved problem, especially for low-resource languages.
Recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs.
In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
arXiv Detail & Related papers (2024-03-21T13:47:40Z) - On-the-Fly Fusion of Large Language Models and Machine Translation [3.718665608549311]
We propose the on-the-fly ensembling of a machine translation model with an LLM prompted on the same task and input.
We find that a slightly weaker-at-translation LLM can improve translations of a NMT model, and ensembling with an LLM can produce better translations than ensembling two stronger MT models.
arXiv Detail & Related papers (2023-11-14T16:49:33Z) - Improving Machine Translation with Large Language Models: A Preliminary Study with Cooperative Decoding [73.32763904267186]
Large Language Models (LLMs) present the potential for achieving superior translation quality.
We propose Cooperative Decoding (CoDec) which treats NMT systems as a pretranslation model and MT-oriented LLMs as a supplemental solution.
arXiv Detail & Related papers (2023-11-06T03:41:57Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - Augmenting Large Language Model Translators via Translation Memories [32.28138249566329]
Using translation memories (TMs) as prompts is a promising approach to in-context learning of machine translation models.
We take a step towards prompting large language models (LLMs) with TMs and making them better translators.
arXiv Detail & Related papers (2023-05-27T04:47:09Z) - Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis [103.89753784762445]
Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT)
This paper systematically investigates the advantages and challenges of LLMs for MMT.
We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4.
arXiv Detail & Related papers (2023-04-10T15:51:30Z) - Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks.
This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.