SCALE: Synergized Collaboration of Asymmetric Language Translation
Engines
- URL: http://arxiv.org/abs/2309.17061v1
- Date: Fri, 29 Sep 2023 08:46:38 GMT
- Title: SCALE: Synergized Collaboration of Asymmetric Language Translation
Engines
- Authors: Xin Cheng and Xun Wang and Tao Ge and Si-Qing Chen and Furu Wei and
Dongyan Zhao and Rui Yan
- Abstract summary: We introduce a collaborative framework that connects compact Specialized Translation Models (STMs) and general-purpose Large Language Models (LLMs) as one unified translation engine.
By introducing translation from STM into the triplet in-context demonstrations, SCALE unlocks refinement and pivoting ability of LLM.
Our experiments show that SCALE significantly outperforms both few-shot LLMs (GPT-4) and specialized models (NLLB) in challenging low-resource settings.
- Score: 105.8983433641208
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce SCALE, a collaborative framework that connects
compact Specialized Translation Models (STMs) and general-purpose Large
Language Models (LLMs) as one unified translation engine. By introducing
translation from STM into the triplet in-context demonstrations, SCALE unlocks
refinement and pivoting ability of LLM, thus mitigating language bias of LLM
and parallel data bias of STM, enhancing LLM speciality without sacrificing
generality, and facilitating continual learning without expensive LLM
fine-tuning. Our comprehensive experiments show that SCALE significantly
outperforms both few-shot LLMs (GPT-4) and specialized models (NLLB) in
challenging low-resource settings. Moreover, in Xhosa to English translation,
SCALE experiences consistent improvement by a 4 BLEURT score without tuning LLM
and surpasses few-shot GPT-4 by 2.5 COMET score and 3.8 BLEURT score when
equipped with a compact model consisting of merely 600M parameters. SCALE could
also effectively exploit the existing language bias of LLMs by using an
English-centric STM as a pivot for translation between any language pairs,
outperforming few-shot GPT-4 by an average of 6 COMET points across eight
translation directions. Furthermore we provide an in-depth analysis of SCALE's
robustness, translation characteristics, and latency costs, providing solid
foundation for future studies exploring the potential synergy between LLMs and
more specialized, task-specific models.
Related papers
- A Preference-driven Paradigm for Enhanced Translation with Large Language Models [33.51585908894444]
Large language models (LLMs) can achieve remarkable translation performance using only a small amount of parallel data.
SFT simply instructs the model to imitate the reference translations at the token level, making it vulnerable to the noise present in the references.
We propose a preference-based approach built upon the Plackett-Luce model to overcome this plateau.
arXiv Detail & Related papers (2024-04-17T11:52:47Z) - Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning [57.323716555996114]
Off-target translation remains an unsolved problem, especially for low-resource languages.
Recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs.
In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
arXiv Detail & Related papers (2024-03-21T13:47:40Z) - Speech Translation with Large Language Models: An Industrial Practice [64.5419534101104]
We introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained large language model (LLM)
By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations.
Through rigorous experimentation on English and Chinese datasets, we showcase the exceptional performance of LLM-ST.
arXiv Detail & Related papers (2023-12-21T05:32:49Z) - MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks [12.665447518524187]
This study aims to perform a thorough evaluation of the non-English capabilities of SoTA LLMs by comparing them on the same set of multilingual datasets.
Our benchmark comprises 22 datasets covering 83 languages, including low-resource African languages.
We also perform a study on data contamination and find that several models are likely to be contaminated with multilingual evaluation benchmarks.
arXiv Detail & Related papers (2023-11-13T16:45:37Z) - Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars.
We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English.
Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
arXiv Detail & Related papers (2023-06-20T08:27:47Z) - Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis [103.89753784762445]
Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT)
This paper systematically investigates the advantages and challenges of LLMs for MMT.
We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4.
arXiv Detail & Related papers (2023-04-10T15:51:30Z) - Zero-Shot Cross-Lingual Summarization via Large Language Models [108.30673793281987]
Cross-lingual summarization ( CLS) generates a summary in a different target language.
Recent emergence of Large Language Models (LLMs) has attracted wide attention from the computational linguistics community.
In this report, we empirically use various prompts to guide LLMs to perform zero-shot CLS from different paradigms.
arXiv Detail & Related papers (2023-02-28T01:27:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.