Breaking the Language Barrier: Can Direct Inference Outperform
Pre-Translation in Multilingual LLM Applications?
- URL: http://arxiv.org/abs/2403.04792v1
- Date: Mon, 4 Mar 2024 14:01:11 GMT
- Title: Breaking the Language Barrier: Can Direct Inference Outperform
Pre-Translation in Multilingual LLM Applications?
- Authors: Yotam Intrator, Matan Halfon, Roman Goldenberg, Reut Tsarfaty, Matan
Eyal, Ehud Rivlin, Yossi Matias, Natalia Aizenberg
- Abstract summary: This study re-evaluates the need for pre-translation in the context of PaLM2 models.
PaLM2-L consistently outperforms pre-translation in 94 out of 108 languages.
- Score: 17.828943682809882
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large language models hold significant promise in multilingual applications.
However, inherent biases stemming from predominantly English-centric
pre-training have led to the widespread practice of pre-translation, i.e.,
translating non-English inputs to English before inference, leading to
complexity and information loss. This study re-evaluates the need for
pre-translation in the context of PaLM2 models (Anil et al., 2023), which have
been established as highly performant in multilingual tasks. We offer a
comprehensive investigation across 108 languages and 6 diverse benchmarks,
including open-end generative tasks, which were excluded from previous similar
studies. Our findings challenge the pre-translation paradigm established in
prior research, highlighting the advantages of direct inference in PaLM2.
Specifically, PaLM2-L consistently outperforms pre-translation in 94 out of 108
languages. These findings pave the way for more efficient and effective
multilingual applications, alleviating the limitations associated with
pre-translation and unlocking linguistic authenticity.
Related papers
- Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment approach to bridge the gap between large language models' English and non-English performance.
Experiment results show that the question alignment approach can be used to boost multilingual performance across diverse reasoning scenarios.
To understand the mechanism of its success, we analyze representation space, chain-of-thought and translation data scales.
arXiv Detail & Related papers (2024-05-02T14:49:50Z) - Decomposed Prompting: Unveiling Multilingual Linguistic Structure
Knowledge in English-Centric Large Language Models [12.700783525558721]
English-centric Large Language Models (LLMs) like GPT-3 and LLaMA display a remarkable ability to perform multilingual tasks.
This paper introduces the decomposed prompting approach to probe the linguistic structure understanding of these LLMs in sequence labeling tasks.
arXiv Detail & Related papers (2024-02-28T15:15:39Z) - Breaking Language Barriers in Multilingual Mathematical Reasoning:
Insights and Observations [90.73517523001149]
This paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs.
By utilizing translation, we construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct.
We propose different training strategies to build powerful xMR LLMs, named MathOctopus, notably outperform conventional open-source LLMs.
arXiv Detail & Related papers (2023-10-31T08:09:20Z) - Don't Trust ChatGPT when Your Question is not in English: A Study of
Multilingual Abilities and Types of LLMs [16.770697902481107]
Large Language Models (LLMs) have demonstrated exceptional natural language understanding abilities.
We propose a systematic way of qualifying the performance disparities of LLMs under multilingual settings.
The results show that GPT exhibits highly translating-like behaviour in multilingual settings.
arXiv Detail & Related papers (2023-05-24T02:05:03Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - Multilingual unsupervised sequence segmentation transfers to extremely
low-resource languages [0.0]
Unsupervised sequence-segmentation performance can be transferred to extremely low-resource languages by pre-training a Masked Segmental Language Model multilingually.
We show that this transfer can be achieved by training over a collection of low-resource languages that are typologically similar (but phylogenetically unrelated) to the target language.
arXiv Detail & Related papers (2021-10-16T00:08:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.