When is BERT Multilingual? Isolating Crucial Ingredients for
Cross-lingual Transfer
- URL: http://arxiv.org/abs/2110.14782v1
- Date: Wed, 27 Oct 2021 21:25:39 GMT
- Title: When is BERT Multilingual? Isolating Crucial Ingredients for
Cross-lingual Transfer
- Authors: Ameet Deshpande, Partha Talukdar, Karthik Narasimhan
- Abstract summary: We show that the absence of sub-word overlap significantly affects zero-shot transfer when languages differ in their word order.
There is a strong correlation between transfer performance and word embedding alignment between languages.
Our results call for focus in multilingual models on explicitly improving word embedding alignment between languages.
- Score: 15.578267998149743
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While recent work on multilingual language models has demonstrated their
capacity for cross-lingual zero-shot transfer on downstream tasks, there is a
lack of consensus in the community as to what shared properties between
languages enable such transfer. Analyses involving pairs of natural languages
are often inconclusive and contradictory since languages simultaneously differ
in many linguistic aspects. In this paper, we perform a large-scale empirical
study to isolate the effects of various linguistic properties by measuring
zero-shot transfer between four diverse natural languages and their
counterparts constructed by modifying aspects such as the script, word order,
and syntax. Among other things, our experiments show that the absence of
sub-word overlap significantly affects zero-shot transfer when languages differ
in their word order, and there is a strong correlation between transfer
performance and word embedding alignment between languages (e.g., R=0.94 on the
task of NLI). Our results call for focus in multilingual models on explicitly
improving word embedding alignment between languages rather than relying on its
implicit emergence.
Related papers
- Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is
It and How Does It Affect Transfer? [50.48082721476612]
Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability.
We investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages.
arXiv Detail & Related papers (2022-12-21T09:44:08Z) - Cross-Lingual Ability of Multilingual Masked Language Models: A Study of
Language Structure [54.01613740115601]
We study three language properties: constituent order, composition and word co-occurrence.
Our main conclusion is that the contribution of constituent order and word co-occurrence is limited, while the composition is more crucial to the success of cross-linguistic transfer.
arXiv Detail & Related papers (2022-03-16T07:09:35Z) - Analyzing the Effects of Reasoning Types on Cross-Lingual Transfer
Performance [10.33152983955968]
Examples in Natural Language Inference (NLI) often pertain to various types of sub-tasks, requiring different kinds of reasoning.
Certain types of reasoning have proven to be more difficult to learn in a monolingual context.
We statistically observe interesting effects that the confluence of reasoning types and language similarities have on transfer performance.
arXiv Detail & Related papers (2021-10-05T22:36:46Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - First Align, then Predict: Understanding the Cross-Lingual Ability of
Multilingual BERT [2.2931318723689276]
Cross-lingual transfer emerges from fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning.
We show that multilingual BERT can be viewed as the stacking of two sub-networks: a multilingual encoder followed by a task-specific language-agnostic predictor.
While the encoder is crucial for cross-lingual transfer and remains mostly unchanged during fine-tuning, the task predictor has little importance on the transfer and can be red during fine-tuning.
arXiv Detail & Related papers (2021-01-26T22:12:38Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.