Discovering Representation Sprachbund For Multilingual Pre-Training
- URL: http://arxiv.org/abs/2109.00271v1
- Date: Wed, 1 Sep 2021 09:32:06 GMT
- Title: Discovering Representation Sprachbund For Multilingual Pre-Training
- Authors: Yimin Fan, Yaobo Liang, Alexandre Muzio, Hany Hassan, Houqiang Li,
Ming Zhou and Nan Duan
- Abstract summary: We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
- Score: 139.05668687865688
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Multilingual pre-trained models have demonstrated their effectiveness in many
multilingual NLP tasks and enabled zero-shot or few-shot transfer from
high-resource languages to low resource ones. However, due to significant
typological differences and contradictions between some languages, such models
usually perform poorly on many languages and cross-lingual settings, which
shows the difficulty of learning a single model to handle massive diverse
languages well at the same time. To alleviate this issue, we present a new
multilingual pre-training pipeline. We propose to generate language
representation from multilingual pre-trained models and conduct linguistic
analysis to show that language representation similarity reflect linguistic
similarity from multiple perspectives, including language family, geographical
sprachbund, lexicostatistics and syntax. Then we cluster all the target
languages into multiple groups and name each group as a representation
sprachbund. Thus, languages in the same representation sprachbund are supposed
to boost each other in both pre-training and fine-tuning as they share rich
linguistic similarity. We pre-train one multilingual model for each
representation sprachbund. Experiments are conducted on cross-lingual
benchmarks and significant improvements are achieved compared to strong
baselines.
Related papers
- The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments [57.273662221547056]
In this study, we investigate an unintuitive novel driver of cross-lingual generalisation: language imbalance.
We observe that the existence of a predominant language during training boosts the performance of less frequent languages.
As we extend our analysis to real languages, we find that infrequent languages still benefit from frequent ones, yet whether language imbalance causes cross-lingual generalisation there is not conclusive.
arXiv Detail & Related papers (2024-04-11T17:58:05Z) - Improving In-context Learning of Multilingual Generative Language Models with Cross-lingual Alignment [42.624862172666624]
We propose a simple yet effective cross-lingual alignment framework exploiting pairs of translation sentences.
It aligns the internal sentence representations across different languages via multilingual contrastive learning.
Experimental results show that even with less than 0.1 textperthousand of pre-training tokens, our alignment framework significantly boosts the cross-lingual abilities of generative language models.
arXiv Detail & Related papers (2023-11-14T11:24:08Z) - The Less the Merrier? Investigating Language Representation in
Multilingual Models [8.632506864465501]
We investigate the linguistic representation of different languages in multilingual models.
We observe from our experiments that community-centered models perform better at distinguishing between languages in the same family for low-resource languages.
arXiv Detail & Related papers (2023-10-20T02:26:34Z) - Discovering Language-neutral Sub-networks in Multilingual Language
Models [15.94622051535847]
Language neutrality of multilingual models is a function of the overlap between language-encoding sub-networks of these models.
Using mBERT as a foundation, we employ the lottery ticket hypothesis to discover sub-networks that are individually optimized for various languages and tasks.
We conclude that mBERT is comprised of a language-neutral sub-network shared among many languages, along with multiple ancillary language-specific sub-networks.
arXiv Detail & Related papers (2022-05-25T11:35:41Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - Are Multilingual Models Effective in Code-Switching? [57.78477547424949]
We study the effectiveness of multilingual language models to understand their capability and adaptability to the mixed-language setting.
Our findings suggest that pre-trained multilingual models do not necessarily guarantee high-quality representations on code-switching.
arXiv Detail & Related papers (2021-03-24T16:20:02Z) - How Good is Your Tokenizer? On the Monolingual Performance of
Multilingual Language Models [96.32118305166412]
We study a set of nine typologically diverse languages with readily available pretrained monolingual models on a set of five diverse monolingual downstream tasks.
We find that languages which are adequately represented in the multilingual model's vocabulary exhibit negligible performance decreases over their monolingual counterparts.
arXiv Detail & Related papers (2020-12-31T14:11:00Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.