Cross-Lingual Language Model Meta-Pretraining
- URL: http://arxiv.org/abs/2109.11129v1
- Date: Thu, 23 Sep 2021 03:47:44 GMT
- Title: Cross-Lingual Language Model Meta-Pretraining
- Authors: Zewen Chi, Heyan Huang, Luyang Liu, Yu Bai, Xian-Ling Mao
- Abstract summary: We propose a cross-lingual language model meta-pretraining, which learns the two abilities in different training phases.
Our method improves both generalization and cross-lingual transfer, and produces better-aligned representations across different languages.
- Score: 21.591492094502424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of pretrained cross-lingual language models relies on two
essential abilities, i.e., generalization ability for learning downstream tasks
in a source language, and cross-lingual transferability for transferring the
task knowledge to other languages. However, current methods jointly learn the
two abilities in a single-phase cross-lingual pretraining process, resulting in
a trade-off between generalization and cross-lingual transfer. In this paper,
we propose cross-lingual language model meta-pretraining, which learns the two
abilities in different training phases. Our method introduces an additional
meta-pretraining phase before cross-lingual pretraining, where the model learns
generalization ability on a large-scale monolingual corpus. Then, the model
focuses on learning cross-lingual transfer on a multilingual corpus.
Experimental results show that our method improves both generalization and
cross-lingual transfer, and produces better-aligned representations across
different languages.
Related papers
- mOthello: When Do Cross-Lingual Representation Alignment and Cross-Lingual Transfer Emerge in Multilingual Models? [15.90185747024602]
We propose a synthetic task, Multilingual Othello (mOthello), as a testbed to delve into two questions.
We find that models trained with naive multilingual pretraining fail to learn a language-neutral representation across all input languages.
We propose a novel approach - multilingual pretraining with unified output space - that both induces the learning of language-neutral representation and facilitates cross-lingual transfer.
arXiv Detail & Related papers (2024-04-18T18:03:08Z) - Can Machine Translation Bridge Multilingual Pretraining and Cross-lingual Transfer Learning? [8.630930380973489]
This paper investigates the potential benefits of employing machine translation as a continued training objective to enhance language representation learning.
Our results show that, contrary to expectations, machine translation as the continued training fails to enhance cross-lingual representation learning.
We conclude that explicit sentence-level alignment in the cross-lingual scenario is detrimental to cross-lingual transfer pretraining.
arXiv Detail & Related papers (2024-03-25T13:53:04Z) - Improving In-context Learning of Multilingual Generative Language Models with Cross-lingual Alignment [42.624862172666624]
We propose a simple yet effective cross-lingual alignment framework exploiting pairs of translation sentences.
It aligns the internal sentence representations across different languages via multilingual contrastive learning.
Experimental results show that even with less than 0.1 textperthousand of pre-training tokens, our alignment framework significantly boosts the cross-lingual abilities of generative language models.
arXiv Detail & Related papers (2023-11-14T11:24:08Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - Cross-lingual Lifelong Learning [53.06904052325966]
We present a principled Cross-lingual Continual Learning (CCL) evaluation paradigm.
We provide insights into what makes multilingual sequential learning particularly challenging.
The implications of this analysis include a recipe for how to measure and balance different cross-lingual continual learning desiderata.
arXiv Detail & Related papers (2022-05-23T09:25:43Z) - Phylogeny-Inspired Adaptation of Multilingual Models to New Languages [43.62238334380897]
We show how we can use language phylogenetic information to improve cross-lingual transfer leveraging closely related languages.
We perform adapter-based training on languages from diverse language families (Germanic, Uralic, Tupian, Uto-Aztecan) and evaluate on both syntactic and semantic tasks.
arXiv Detail & Related papers (2022-05-19T15:49:19Z) - Cross-Lingual Ability of Multilingual Masked Language Models: A Study of
Language Structure [54.01613740115601]
We study three language properties: constituent order, composition and word co-occurrence.
Our main conclusion is that the contribution of constituent order and word co-occurrence is limited, while the composition is more crucial to the success of cross-linguistic transfer.
arXiv Detail & Related papers (2022-03-16T07:09:35Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z) - InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language
Model Pre-Training [135.12061144759517]
We present an information-theoretic framework that formulates cross-lingual language model pre-training.
We propose a new pre-training task based on contrastive learning.
By leveraging both monolingual and parallel corpora, we jointly train the pretext to improve the cross-lingual transferability of pre-trained models.
arXiv Detail & Related papers (2020-07-15T16:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.