Breaking the Language Barrier: Can Direct Inference Outperform
Pre-Translation in Multilingual LLM Applications?
- URL: http://arxiv.org/abs/2403.04792v1
- Date: Mon, 4 Mar 2024 14:01:11 GMT
- Title: Breaking the Language Barrier: Can Direct Inference Outperform
Pre-Translation in Multilingual LLM Applications?
- Authors: Yotam Intrator, Matan Halfon, Roman Goldenberg, Reut Tsarfaty, Matan
Eyal, Ehud Rivlin, Yossi Matias, Natalia Aizenberg
- Abstract summary: This study re-evaluates the need for pre-translation in the context of PaLM2 models.
PaLM2-L consistently outperforms pre-translation in 94 out of 108 languages.
- Score: 17.828943682809882
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large language models hold significant promise in multilingual applications.
However, inherent biases stemming from predominantly English-centric
pre-training have led to the widespread practice of pre-translation, i.e.,
translating non-English inputs to English before inference, leading to
complexity and information loss. This study re-evaluates the need for
pre-translation in the context of PaLM2 models (Anil et al., 2023), which have
been established as highly performant in multilingual tasks. We offer a
comprehensive investigation across 108 languages and 6 diverse benchmarks,
including open-end generative tasks, which were excluded from previous similar
studies. Our findings challenge the pre-translation paradigm established in
prior research, highlighting the advantages of direct inference in PaLM2.
Specifically, PaLM2-L consistently outperforms pre-translation in 94 out of 108
languages. These findings pave the way for more efficient and effective
multilingual applications, alleviating the limitations associated with
pre-translation and unlocking linguistic authenticity.
Related papers
- X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale [25.257770733168012]
Large language models (LLMs) have achieved remarkable success across various NLP tasks, yet their focus has predominantly been on English.
In this paper, we prioritize quality over scaling number of languages, with a focus on multilingual machine translation task.
X-ALMA is a model designed with a commitment to ensuring top-tier performance across 50 diverse languages, regardless of their resource levels.
arXiv Detail & Related papers (2024-10-04T03:17:27Z) - PreAlign: Boosting Cross-Lingual Transfer by Early Establishment of Multilingual Alignment [68.20851615263953]
Large language models demonstrate reasonable multilingual abilities, despite predominantly English-centric pretraining.
The spontaneous multilingual alignment in these models is shown to be weak, leading to unsatisfactory cross-lingual transfer and knowledge sharing.
We propose PreAlign, a framework that establishes multilingual alignment prior to language model pretraining.
arXiv Detail & Related papers (2024-07-23T06:59:53Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance.
Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes.
We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z) - Decomposed Prompting: Unveiling Multilingual Linguistic Structure
Knowledge in English-Centric Large Language Models [12.700783525558721]
English-centric Large Language Models (LLMs) like GPT-3 and LLaMA display a remarkable ability to perform multilingual tasks.
This paper introduces the decomposed prompting approach to probe the linguistic structure understanding of these LLMs in sequence labeling tasks.
arXiv Detail & Related papers (2024-02-28T15:15:39Z) - Don't Trust ChatGPT when Your Question is not in English: A Study of
Multilingual Abilities and Types of LLMs [16.770697902481107]
Large Language Models (LLMs) have demonstrated exceptional natural language understanding abilities.
We propose a systematic way of qualifying the performance disparities of LLMs under multilingual settings.
The results show that GPT exhibits highly translating-like behaviour in multilingual settings.
arXiv Detail & Related papers (2023-05-24T02:05:03Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.