Understanding Cross-Lingual Syntactic Transfer in Multilingual Recurrent
Neural Networks
- URL: http://arxiv.org/abs/2003.14056v3
- Date: Wed, 14 Apr 2021 11:26:21 GMT
- Title: Understanding Cross-Lingual Syntactic Transfer in Multilingual Recurrent
Neural Networks
- Authors: Prajit Dhar and Arianna Bisazza
- Abstract summary: It is now established that modern neural language models can be successfully trained on multiple languages simultaneously.
But what kind of knowledge is really shared among languages within these models?
In this paper we dissect different forms of cross-lingual transfer and look for its most determining factors.
We find that exposing our LMs to a related language does not always increase grammatical knowledge in the target language, and that optimal conditions for lexical-semantic transfer may not be optimal for syntactic transfer.
- Score: 3.9342247746757435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is now established that modern neural language models can be successfully
trained on multiple languages simultaneously without changes to the underlying
architecture. But what kind of knowledge is really shared among languages
within these models? Does multilingual training mostly lead to an alignment of
the lexical representation spaces or does it also enable the sharing of purely
grammatical knowledge? In this paper we dissect different forms of
cross-lingual transfer and look for its most determining factors, using a
variety of models and probing tasks. We find that exposing our LMs to a related
language does not always increase grammatical knowledge in the target language,
and that optimal conditions for lexical-semantic transfer may not be optimal
for syntactic transfer.
Related papers
- Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - mOthello: When Do Cross-Lingual Representation Alignment and Cross-Lingual Transfer Emerge in Multilingual Models? [15.90185747024602]
We propose a synthetic task, Multilingual Othello (mOthello), as a testbed to delve into two questions.
We find that models trained with naive multilingual pretraining fail to learn a language-neutral representation across all input languages.
We propose a novel approach - multilingual pretraining with unified output space - that both induces the learning of language-neutral representation and facilitates cross-lingual transfer.
arXiv Detail & Related papers (2024-04-18T18:03:08Z) - Are Structural Concepts Universal in Transformer Language Models?
Towards Interpretable Cross-Lingual Generalization [27.368684663279463]
We investigate the potential for explicitly aligning conceptual correspondence between languages to enhance cross-lingual generalization.
Using the syntactic aspect of language as a testbed, our analyses of 43 languages reveal a high degree of alignability.
We propose a meta-learning-based method to learn to align conceptual spaces of different languages.
arXiv Detail & Related papers (2023-10-19T14:50:51Z) - Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is
It and How Does It Affect Transfer? [50.48082721476612]
Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability.
We investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages.
arXiv Detail & Related papers (2022-12-21T09:44:08Z) - Learning an Artificial Language for Knowledge-Sharing in Multilingual
Translation [15.32063273544696]
We discretize the latent space of multilingual models by assigning encoder states to entries in a codebook.
We validate our approach on large-scale experiments with realistic data volumes and domains.
We also use the learned artificial language to analyze model behavior, and discover that using a similar bridge language increases knowledge-sharing among the remaining languages.
arXiv Detail & Related papers (2022-11-02T17:14:42Z) - Cross-Lingual Ability of Multilingual Masked Language Models: A Study of
Language Structure [54.01613740115601]
We study three language properties: constituent order, composition and word co-occurrence.
Our main conclusion is that the contribution of constituent order and word co-occurrence is limited, while the composition is more crucial to the success of cross-linguistic transfer.
arXiv Detail & Related papers (2022-03-16T07:09:35Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.