Simultaneous Machine Translation with Large Language Models
- URL: http://arxiv.org/abs/2309.06706v2
- Date: Thu, 15 Feb 2024 06:50:00 GMT
- Title: Simultaneous Machine Translation with Large Language Models
- Authors: Minghan Wang, Jinming Zhao, Thuy-Trang Vu, Fatemeh Shiri, Ehsan
Shareghi, Gholamreza Haffari
- Abstract summary: We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
- Score: 51.470478122113356
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-world simultaneous machine translation (SimulMT) systems face more
challenges than just the quality-latency trade-off. They also need to address
issues related to robustness with noisy input, processing long contexts, and
flexibility for knowledge injection. These challenges demand models with strong
language understanding and generation capabilities which may not often equipped
by dedicated MT models. In this paper, we investigate the possibility of
applying Large Language Models (LLM) to SimulMT tasks by using existing
incremental-decoding methods with a newly proposed RALCP algorithm for latency
reduction. We conducted experiments using the \texttt{Llama2-7b-chat} model on
nine different languages from the MUST-C dataset. The results show that LLM
outperforms dedicated MT models in terms of BLEU and LAAL metrics. Further
analysis indicates that LLM has advantages in terms of tuning efficiency and
robustness. However, it is important to note that the computational cost of LLM
remains a significant obstacle to its application in SimulMT.\footnote{We will
release our code, weights, and data with publication.}
Related papers
- Can General-Purpose Large Language Models Generalize to English-Thai Machine Translation ? [2.1969983462375318]
Large language models (LLMs) perform well on common tasks but struggle with generalization in low-resource and low-computation settings.
We examine this limitation by testing various LLMs and specialized translation models on English-Thai machine translation and code-switching datasets.
arXiv Detail & Related papers (2024-10-22T16:26:03Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT)
We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training.
Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z) - Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages [2.53740603524637]
Machine translation models (MT) produce excellent multilingual representations, resulting in strong translation performance even for low-resource languages.
In this work, we get the best both worlds by integrating MT encoders directly into language backbones via sample-efficient self-distillation.
The resulting MT-LLMs preserve the inherent multilingual representational alignment from the MT encoder, allowing lower-resource languages to tap into the rich knowledge embedded in English-centric LLMs.
arXiv Detail & Related papers (2024-06-18T16:00:20Z) - Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression [40.4998607679863]
Large Language Models (LLMs) often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data.
This paper focuses on TG-SFT, which can synthetically generate SFT data for the instruction tuning steps.
arXiv Detail & Related papers (2024-06-17T09:17:40Z) - Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models [4.873927154453253]
Large language models (LLMs) with billions of parameters and pretrained on massive amounts of data are now capable of near or better than state-of-the-art performance in a variety of downstream natural language processing tasks.
Simul-LLM is the first open-source fine-tuning and evaluation pipeline development framework for LLMs focused on SimulMT.
arXiv Detail & Related papers (2023-12-07T20:42:05Z) - Improving Machine Translation with Large Language Models: A Preliminary Study with Cooperative Decoding [73.32763904267186]
Large Language Models (LLMs) present the potential for achieving superior translation quality.
We propose Cooperative Decoding (CoDec) which treats NMT systems as a pretranslation model and MT-oriented LLMs as a supplemental solution.
arXiv Detail & Related papers (2023-11-06T03:41:57Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - TIM: Teaching Large Language Models to Translate with Comparison [78.66926087162672]
We propose a novel framework using examples in comparison to teach LLMs to learn translation.
Our approach involves presenting the model with examples of correct and incorrect translations and using a preference loss to guide the model's learning.
Our findings offer a new perspective on fine-tuning LLMs for translation tasks and provide a promising solution for generating high-quality translations.
arXiv Detail & Related papers (2023-07-10T08:15:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.