SiLLM: Large Language Models for Simultaneous Machine Translation
- URL: http://arxiv.org/abs/2402.13036v1
- Date: Tue, 20 Feb 2024 14:23:34 GMT
- Title: SiLLM: Large Language Models for Simultaneous Machine Translation
- Authors: Shoutao Guo, Shaolei Zhang, Zhengrui Ma, Min Zhang, Yang Feng
- Abstract summary: Simultaneous Machine Translation (SiMT) generates translations while reading the source sentence.
Existing SiMT methods employ a single model to concurrently determine the policy and generate the translations.
We propose SiLLM, which delegates the two sub-tasks to separate agents.
- Score: 41.303764786790616
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Simultaneous Machine Translation (SiMT) generates translations while reading
the source sentence, necessitating a policy to determine the optimal timing for
reading and generating words. Despite the remarkable performance achieved by
Large Language Models (LLM) across various NLP tasks, existing SiMT methods
predominantly focus on conventional transformers, employing a single model to
concurrently determine the policy and generate the translations. However, given
the complexity of SiMT, it is challenging to effectively address both tasks
with a single model. Therefore, there is a need to decouple the SiMT task into
policy-decision and translation sub-tasks. We propose SiLLM, which delegates
the two sub-tasks to separate agents, thereby incorporating LLM into SiMT. The
policy-decision agent is managed by a conventional SiMT model, responsible for
determining the translation policy. The translation agent, leveraging the
capabilities of LLM, generates translation using the partial source sentence.
The two agents collaborate to accomplish SiMT. To facilitate the application of
token-level policies determined by conventional SiMT models to LLM, we propose
a word-level policy adapted for LLM. Experiments on two datasets demonstrate
that, with a small amount of data for fine-tuning LLM, SiLLM attains
state-of-the-art performance.
Related papers
- TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks.
We propose the TasTe framework, which stands for translating through self-reflection.
The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z) - Agent-SiMT: Agent-assisted Simultaneous Machine Translation with Large Language Models [38.49925017512848]
Simultaneous Machine Translation (SiMT) generates target translations while reading the source sentence.
Existing SiMT methods generally adopt the traditional Transformer architecture, which concurrently determines the policy and generates translations.
We introduce Agent-SiMT, a framework combining the strengths of Large Language Models (LLMs) and traditional SiMT methods.
arXiv Detail & Related papers (2024-06-11T03:09:20Z) - Meta-Task Prompting Elicits Embeddings from Large Language Models [54.757445048329735]
We introduce a new unsupervised text embedding method, Meta-Task Prompting with Explicit One-Word Limitation.
We generate high-quality sentence embeddings from Large Language Models without the need for model fine-tuning.
Our findings suggest a new scaling law, offering a versatile and resource-efficient approach for embedding generation across diverse scenarios.
arXiv Detail & Related papers (2024-02-28T16:35:52Z) - DeMPT: Decoding-enhanced Multi-phase Prompt Tuning for Making LLMs Be Better Context-aware Translators [26.665489056201725]
We propose an adaptation approach, named Decoding-enhanced Multi-phase Prompt Tuning (DeMPT)
During each phase, different continuous prompts are introduced to make LLMs discriminately model various information.
Experiments show that our approach significantly outperforms the concatenation method.
arXiv Detail & Related papers (2024-02-23T09:01:00Z) - TransLLaMa: LLM-based Simultaneous Translation System [18.27477980076409]
We show that a Decoder-only large language model (LLMs) can control input segmentation directly by generating a special "wait" token.
This obviates the need for a separate policy and enables the LLM to perform English-German and English-Russian SiMT tasks.
We also evaluated closed-source models such as GPT-4, which displayed encouraging results in performing the SiMT task without prior training.
arXiv Detail & Related papers (2024-02-07T07:39:27Z) - Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models [4.873927154453253]
Large language models (LLMs) with billions of parameters and pretrained on massive amounts of data are now capable of near or better than state-of-the-art performance in a variety of downstream natural language processing tasks.
Simul-LLM is the first open-source fine-tuning and evaluation pipeline development framework for LLMs focused on SimulMT.
arXiv Detail & Related papers (2023-12-07T20:42:05Z) - Improving Machine Translation with Large Language Models: A Preliminary Study with Cooperative Decoding [73.32763904267186]
Large Language Models (LLMs) present the potential for achieving superior translation quality.
We propose Cooperative Decoding (CoDec) which treats NMT systems as a pretranslation model and MT-oriented LLMs as a supplemental solution.
arXiv Detail & Related papers (2023-11-06T03:41:57Z) - Tuning Large language model for End-to-end Speech Translation [7.297914077124909]
This paper introduces LST, a large multimodal model designed to excel at the E2E-ST task.
Experimental results on the MuST-C speech translation benchmark demonstrate that LST-13B BLEU scores of 30.39/41.55/35.33 on En-De/En-Fr/En-Es language pairs, surpassing previous models and establishing a new state-of-the-art.
arXiv Detail & Related papers (2023-10-03T13:43:50Z) - Dictionary-based Phrase-level Prompting of Large Language Models for
Machine Translation [91.57514888410205]
Large language models (LLMs) demonstrate remarkable machine translation (MT) abilities via prompting.
LLMs can struggle to translate inputs with rare words, which are common in low resource or domain transfer scenarios.
We show that LLM prompting can provide an effective solution for rare words as well, by using prior knowledge from bilingual dictionaries to provide control hints in the prompts.
arXiv Detail & Related papers (2023-02-15T18:46:42Z) - A Variational Hierarchical Model for Neural Cross-Lingual Summarization [85.44969140204026]
Cross-lingual summarization () is to convert a document in one language to a summary in another one.
Existing studies on CLS mainly focus on utilizing pipeline methods or jointly training an end-to-end model.
We propose a hierarchical model for the CLS task, based on the conditional variational auto-encoder.
arXiv Detail & Related papers (2022-03-08T02:46:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.