Multilingual Pretraining and Instruction Tuning Improve Cross-Lingual Knowledge Alignment, But Only Shallowly
- URL: http://arxiv.org/abs/2404.04659v1
- Date: Sat, 6 Apr 2024 15:25:06 GMT
- Title: Multilingual Pretraining and Instruction Tuning Improve Cross-Lingual Knowledge Alignment, But Only Shallowly
- Authors: Changjiang Gao, Hongda Hu, Peng Hu, Jiajun Chen, Jixing Li, Shujian Huang,
- Abstract summary: Two approaches are proposed to address this, i.e., multilingual pretraining and multilingual instruction tuning.
We propose CLiKA to assess the cross-lingual knowledge alignment of LLMs in the Performance, Consistency and Conductivity levels.
Results show that while both multilingual pretraining and instruction tuning are beneficial for cross-lingual knowledge alignment, the training strategy needs to be carefully designed.
- Score: 53.04368883943773
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite their strong ability to retrieve knowledge in English, current large language models show imbalance abilities in different languages. Two approaches are proposed to address this, i.e., multilingual pretraining and multilingual instruction tuning. However, whether and how do such methods contribute to the cross-lingual knowledge alignment inside the models is unknown. In this paper, we propose CLiKA, a systematic framework to assess the cross-lingual knowledge alignment of LLMs in the Performance, Consistency and Conductivity levels, and explored the effect of multilingual pretraining and instruction tuning on the degree of alignment. Results show that: while both multilingual pretraining and instruction tuning are beneficial for cross-lingual knowledge alignment, the training strategy needs to be carefully designed. Namely, continued pretraining improves the alignment of the target language at the cost of other languages, while mixed pretraining affect other languages less. Also, the overall cross-lingual knowledge alignment, especially in the conductivity level, is unsatisfactory for all tested LLMs, and neither multilingual pretraining nor instruction tuning can substantially improve the cross-lingual knowledge conductivity.
Related papers
- PreAlign: Boosting Cross-Lingual Transfer by Early Establishment of Multilingual Alignment [68.20851615263953]
Large language models demonstrate reasonable multilingual abilities, despite predominantly English-centric pretraining.
The spontaneous multilingual alignment in these models is shown to be weak, leading to unsatisfactory cross-lingual transfer and knowledge sharing.
We propose PreAlign, a framework that establishes multilingual alignment prior to language model pretraining.
arXiv Detail & Related papers (2024-07-23T06:59:53Z) - No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement [59.37775534633868]
We introduce a novel method called language arithmetic, which enables training-free post-processing.
The effectiveness of the proposed solution is demonstrated on three downstream tasks in a MAD-X-based set of cross-lingual schemes.
arXiv Detail & Related papers (2024-04-24T08:52:40Z) - Multilingual Instruction Tuning With Just a Pinch of Multilinguality [31.360147312195068]
We show that many languages transfer some instruction-following capabilities to other languages from even monolingual tuning.
We observe that models tuned on multilingual mixtures exhibit comparable or superior performance in multiple languages.
diversifying the instruction tuning set with even just 2-4 languages significantly improves cross-lingual generalization.
arXiv Detail & Related papers (2024-01-03T17:48:10Z) - Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed? [40.13166574854085]
We investigate the minimal amount of multilinguality required to elicit cross-lingual generalisation in English-centric large language models.
We find that multilingual instruction tuning with as few as two to three languages is both necessary and sufficient to elicit effective cross-lingual generalisation.
arXiv Detail & Related papers (2023-12-20T00:49:52Z) - Improving In-context Learning of Multilingual Generative Language Models with Cross-lingual Alignment [42.624862172666624]
We propose a simple yet effective cross-lingual alignment framework exploiting pairs of translation sentences.
It aligns the internal sentence representations across different languages via multilingual contrastive learning.
Experimental results show that even with less than 0.1 textperthousand of pre-training tokens, our alignment framework significantly boosts the cross-lingual abilities of generative language models.
arXiv Detail & Related papers (2023-11-14T11:24:08Z) - Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions [68.01449013641532]
Large-scale Pretrained Language Models (LLMs) have shown strong abilities in multilingual translations.
We present a detailed analysis by finetuning a multilingual pretrained language model, XGLM-7B, to perform multilingual translation.
arXiv Detail & Related papers (2023-05-24T12:00:24Z) - VECO 2.0: Cross-lingual Language Model Pre-training with
Multi-granularity Contrastive Learning [56.47303426167584]
We propose a cross-lingual pre-trained model VECO2.0 based on contrastive learning with multi-granularity alignments.
Specifically, the sequence-to-sequence alignment is induced to maximize the similarity of the parallel pairs and minimize the non-parallel pairs.
token-to-token alignment is integrated to bridge the gap between synonymous tokens excavated via the thesaurus dictionary from the other unpaired tokens in a bilingual instance.
arXiv Detail & Related papers (2023-04-17T12:23:41Z) - Cross-Lingual Language Model Meta-Pretraining [21.591492094502424]
We propose a cross-lingual language model meta-pretraining, which learns the two abilities in different training phases.
Our method improves both generalization and cross-lingual transfer, and produces better-aligned representations across different languages.
arXiv Detail & Related papers (2021-09-23T03:47:44Z) - Competence-based Curriculum Learning for Multilingual Machine
Translation [28.30800327665549]
Existing multilingual machine translation models face a severe challenge: imbalance.
We propose Competence-based Curriculum Learning for Multilingual Machine Translation.
Our approach has achieved a steady and significant performance gain compared to the previous state-of-the-art approach on the TED talks dataset.
arXiv Detail & Related papers (2021-09-09T02:52:34Z) - Improving the Lexical Ability of Pretrained Language Models for
Unsupervised Neural Machine Translation [127.81351683335143]
Cross-lingual pretraining requires models to align the lexical- and high-level representations of the two languages.
Previous research has shown that this is because the representations are not sufficiently aligned.
In this paper, we enhance the bilingual masked language model pretraining with lexical-level information by using type-level cross-lingual subword embeddings.
arXiv Detail & Related papers (2021-03-18T21:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.