Related papers: TransLLaMa: LLM-based Simultaneous Translation System

TransLLaMa: LLM-based Simultaneous Translation System

URL: http://arxiv.org/abs/2402.04636v1
Date: Wed, 7 Feb 2024 07:39:27 GMT
Title: TransLLaMa: LLM-based Simultaneous Translation System
Authors: Roman Koshkin, Katsuhito Sudoh and Satoshi Nakamura
Abstract summary: We show that a Decoder-only large language model (LLMs) can control input segmentation directly by generating a special "wait" token. This obviates the need for a separate policy and enables the LLM to perform English-German and English-Russian SiMT tasks. We also evaluated closed-source models such as GPT-4, which displayed encouraging results in performing the SiMT task without prior training.
Score: 18.27477980076409
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Decoder-only large language models (LLMs) have recently demonstrated impressive capabilities in text generation and reasoning. Nonetheless, they have limited applications in simultaneous machine translation (SiMT), currently dominated by encoder-decoder transformers. This study demonstrates that, after fine-tuning on a small dataset comprising causally aligned source and target sentence pairs, a pre-trained open-source LLM can control input segmentation directly by generating a special "wait" token. This obviates the need for a separate policy and enables the LLM to perform English-German and English-Russian SiMT tasks with BLEU scores that are comparable to those of specific state-of-the-art baselines. We also evaluated closed-source models such as GPT-4, which displayed encouraging results in performing the SiMT task without prior training (zero-shot), indicating a promising avenue for enhancing future SiMT systems.

Related papers

LLMs Can Achieve High-quality Simultaneous Machine Translation as Efficiently as Offline [16.124385656402744]
Large Language Models (LLMs) perform excellently in offline machine translation even with a simple prompt "Translate the following sentence from [src lang] into [tgt lang]:" We propose a novel paradigm that includes constructing supervised fine-tuning data for simultaneous machine translation (SiMT) Our approach achieves state-of-the-art performance across various SiMT benchmarks, and preserves the original abilities of offline translation.
arXiv Detail & Related papers (2025-04-13T13:45:53Z)
Towards Zero-Shot Multimodal Machine Translation [64.9141931372384]
We propose a method to bypass the need for fully supervised data to train multimodal machine translation systems. Our method, called ZeroMMT, consists in adapting a strong text-only machine translation (MT) model by training it on a mixture of two objectives. To prove that our method generalizes to languages with no fully supervised training data available, we extend the CoMMuTE evaluation dataset to three new languages: Arabic, Russian and Chinese.
arXiv Detail & Related papers (2024-07-18T15:20:31Z)
LLMs Are Zero-Shot Context-Aware Simultaneous Translators [16.260150631363313]
Large language models (LLMs) have come to the spotlight thanks to their generality and strong performance in a wide range of language tasks. Here we show that open-source LLMs perform on par with or better than some state-of-the-art baselines in simultaneous machine translation (SiMT) tasks.
arXiv Detail & Related papers (2024-06-19T11:57:42Z)
Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages [2.53740603524637]
Machine translation models (MT) produce excellent multilingual representations, resulting in strong translation performance even for low-resource languages. In this work, we get the best both worlds by integrating MT encoders directly into language backbones via sample-efficient self-distillation. The resulting MT-LLMs preserve the inherent multilingual representational alignment from the MT encoder, allowing lower-resource languages to tap into the rich knowledge embedded in English-centric LLMs.
arXiv Detail & Related papers (2024-06-18T16:00:20Z)
TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks. We propose the TasTe framework, which stands for translating through self-reflection. The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z)
Speech Translation with Large Language Models: An Industrial Practice [64.5419534101104]
We introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained large language model (LLM) By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations. Through rigorous experimentation on English and Chinese datasets, we showcase the exceptional performance of LLM-ST.
arXiv Detail & Related papers (2023-12-21T05:32:49Z)
Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models [4.873927154453253]
Large language models (LLMs) with billions of parameters and pretrained on massive amounts of data are now capable of near or better than state-of-the-art performance in a variety of downstream natural language processing tasks. Simul-LLM is the first open-source fine-tuning and evaluation pipeline development framework for LLMs focused on SimulMT.
arXiv Detail & Related papers (2023-12-07T20:42:05Z)
Improving Machine Translation with Large Language Models: A Preliminary Study with Cooperative Decoding [73.32763904267186]
Large Language Models (LLMs) present the potential for achieving superior translation quality. We propose Cooperative Decoding (CoDec) which treats NMT systems as a pretranslation model and MT-oriented LLMs as a supplemental solution.
arXiv Detail & Related papers (2023-11-06T03:41:57Z)
Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks. We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset. The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z)
Language Models are Good Translators [63.528370845657896]
We show that a single language model (LM4MT) can achieve comparable performance with strong encoder-decoder NMT models. Experiments on pivot-based and zero-shot translation tasks show that LM4MT can outperform the encoder-decoder NMT model by a large margin.
arXiv Detail & Related papers (2021-06-25T13:30:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.