Extrapolating Large Language Models to Non-English by Aligning Languages
- URL: http://arxiv.org/abs/2308.04948v2
- Date: Mon, 9 Oct 2023 14:08:40 GMT
- Title: Extrapolating Large Language Models to Non-English by Aligning Languages
- Authors: Wenhao Zhu, Yunzhe Lv, Qingxiu Dong, Fei Yuan, Jingjing Xu, Shujian
Huang, Lingpeng Kong, Jiajun Chen, Lei Li
- Abstract summary: Existing large language models show disparate capability across different languages.
In this paper, we empower pre-trained LLMs on non-English languages by building semantic alignment across languages.
- Score: 109.09051737966178
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing large language models show disparate capability across different
languages, due to the imbalance in the training data. Their performances on
English tasks are often stronger than on tasks of other languages. In this
paper, we empower pre-trained LLMs on non-English languages by building
semantic alignment across languages. We start from targeting individual
languages by performing cross-lingual instruction-tuning (CoIT) on LLaMA, i.e.
tuning it with translation task data and cross-lingual general task data to
obtain cross-lingual models (x-LLaMAs), and formulate underlying scaling laws
to investigate the advantages of using scalable translation data. Then we
perform multilingual instruction-tuning (MuIT) with mixed resources to build
multilingual m-LLaMA. We also illustrate how we leverage the scaling laws to
optimize data allocation in a resource-constrained setting. Experiment results
on cross-lingual benchmarks XQUAD and MLQA show that x-LLaMAs surpass the
English instruction-tuned counterpart (Alpaca) by an average of 27.83% across
six non-English languages. Evaluation results on translation dataset Flores-101
show that x-LLaMAs outperform previous LLaMA-based models by an average of
18.89%. Encouragingly, m-LLaMA achieves comparable performance to x-LLaMAs on
individual languages and demonstrates the ability to follow multilingual
instructions. Further analysis on response content and representation space
reveals the alignment of the multilingual semantic space within the middle
layers of m-LLaMA.
Related papers
- X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale [25.257770733168012]
Large language models (LLMs) have achieved remarkable success across various NLP tasks, yet their focus has predominantly been on English.
In this paper, we prioritize quality over scaling number of languages, with a focus on multilingual machine translation task.
X-ALMA is a model designed with a commitment to ensuring top-tier performance across 50 diverse languages, regardless of their resource levels.
arXiv Detail & Related papers (2024-10-04T03:17:27Z) - Pruning Multilingual Large Language Models for Multilingual Inference [28.36717615166238]
This study explores how to enhance the zero-shot performance of MLLMs in non-English languages.
We first analyze the behavior of MLLMs when performing translation and reveal that there are large magnitude features that play a critical role in the translation process.
arXiv Detail & Related papers (2024-09-25T13:15:50Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners [67.85635044939836]
Large Language Models (LLMs) have shown impressive language capabilities.
In this work, we investigate the spontaneous multilingual alignment improvement of LLMs.
We find that LLMs instruction-tuned on the question translation data (i.e. without annotated answers) are able to encourage the alignment between English and a wide range of languages.
arXiv Detail & Related papers (2024-05-22T16:46:19Z) - Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation [25.850573463743352]
Large-scale multilingual Pretrained Language Models (mPLMs) yield impressive performance on cross-language tasks.
Yet significant performance disparities exist across different languages within the same mPLM.
We introduce ALSACE to leverage the learned knowledge from the well-performing languages to guide under-performing ones within the same mPLM.
arXiv Detail & Related papers (2024-04-12T14:19:16Z) - Enhancing Multilingual Capabilities of Large Language Models through
Self-Distillation from Resource-Rich Languages [60.162717568496355]
Large language models (LLMs) have been pre-trained on multilingual corpora.
Their performance still lags behind in most languages compared to a few resource-rich languages.
arXiv Detail & Related papers (2024-02-19T15:07:32Z) - Empowering Cross-lingual Abilities of Instruction-tuned Large Language
Models by Translation-following demonstrations [0.8133739801185272]
We propose CrossAlpaca, an It-LLM with cross-lingual instruction-following and Translation-following demonstrations.
Our models, tested over six different languages, outperform the It-LLMs tuned on monolingual data.
arXiv Detail & Related papers (2023-08-27T19:22:12Z) - PolyLM: An Open Source Polyglot Large Language Model [57.64420154135178]
We present PolyLM, a multilingual large language model (LLMs) trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B.
To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training.
Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning.
arXiv Detail & Related papers (2023-07-12T09:00:37Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - Multilingual Transfer Learning for QA Using Translation as Data
Augmentation [13.434957024596898]
We explore strategies that improve cross-lingual transfer by bringing the multilingual embeddings closer in the semantic space.
We propose two novel strategies, language adversarial training and language arbitration framework, which significantly improve the (zero-resource) cross-lingual transfer performance.
Empirically, we show that the proposed models outperform the previous zero-shot baseline on the recently introduced multilingual MLQA and TyDiQA datasets.
arXiv Detail & Related papers (2020-12-10T20:29:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.