Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is
It and How Does It Affect Transfer?
- URL: http://arxiv.org/abs/2212.10879v1
- Date: Wed, 21 Dec 2022 09:44:08 GMT
- Title: Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is
It and How Does It Affect Transfer?
- Authors: Ningyu Xu, Tao Gui, Ruotian Ma, Qi Zhang, Jingting Ye, Menghan Zhang,
Xuanjing Huang
- Abstract summary: Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability.
We investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages.
- Score: 50.48082721476612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual BERT (mBERT) has demonstrated considerable cross-lingual
syntactic ability, whereby it enables effective zero-shot cross-lingual
transfer of syntactic knowledge. The transfer is more successful between some
languages, but it is not well understood what leads to this variation and
whether it fairly reflects difference between languages. In this work, we
investigate the distributions of grammatical relations induced from mBERT in
the context of 24 typologically different languages. We demonstrate that the
distance between the distributions of different languages is highly consistent
with the syntactic difference in terms of linguistic formalisms. Such
difference learnt via self-supervision plays a crucial role in the zero-shot
transfer performance and can be predicted by variation in morphosyntactic
properties between languages. These results suggest that mBERT properly encodes
languages in a way consistent with linguistic diversity and provide insights
into the mechanism of cross-lingual transfer.
Related papers
- Assessing the Role of Lexical Semantics in Cross-lingual Transfer through Controlled Manipulations [15.194196775504613]
We analyze how differences between English and a target language influence the capacity to align the language with an English pretrained representation space.
We show that while properties such as the script or word order only have a limited impact on alignment quality, the degree of lexical matching between the two languages, which we define using a measure of translation entropy, greatly affects it.
arXiv Detail & Related papers (2024-08-14T14:59:20Z) - Disentangling the Roles of Target-Side Transfer and Regularization in
Multilingual Machine Translation [9.838281446902268]
We conduct a large-scale study that varies the auxiliary target side languages along two dimensions.
We show that linguistically similar target languages exhibit strong ability to transfer positive knowledge.
With an increasing size of similar target languages, the positive transfer is further enhanced to benefit the main language pairs.
Meanwhile, distant auxiliary target languages can also unexpectedly benefit main language pairs, even with minimal positive transfer ability.
arXiv Detail & Related papers (2024-02-01T10:55:03Z) - Languages You Know Influence Those You Learn: Impact of Language
Characteristics on Multi-Lingual Text-to-Text Transfer [4.554080966463776]
Multi-lingual language models (LM) have been remarkably successful in enabling natural language tasks in low-resource languages.
We try to better understand how such models, specifically mT5, transfer *any* linguistic and semantic knowledge across languages.
A key finding of this work is that similarity of syntax, morphology and phonology are good predictors of cross-lingual transfer.
arXiv Detail & Related papers (2022-12-04T07:22:21Z) - Cross-Lingual Ability of Multilingual Masked Language Models: A Study of
Language Structure [54.01613740115601]
We study three language properties: constituent order, composition and word co-occurrence.
Our main conclusion is that the contribution of constituent order and word co-occurrence is limited, while the composition is more crucial to the success of cross-linguistic transfer.
arXiv Detail & Related papers (2022-03-16T07:09:35Z) - When is BERT Multilingual? Isolating Crucial Ingredients for
Cross-lingual Transfer [15.578267998149743]
We show that the absence of sub-word overlap significantly affects zero-shot transfer when languages differ in their word order.
There is a strong correlation between transfer performance and word embedding alignment between languages.
Our results call for focus in multilingual models on explicitly improving word embedding alignment between languages.
arXiv Detail & Related papers (2021-10-27T21:25:39Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - What makes multilingual BERT multilingual? [60.9051207862378]
In this work, we provide an in-depth experimental study to supplement the existing literature of cross-lingual ability.
We compare the cross-lingual ability of non-contextualized and contextualized representation model with the same data.
We found that datasize and context window size are crucial factors to the transferability.
arXiv Detail & Related papers (2020-10-20T05:41:56Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - A Study of Cross-Lingual Ability and Language-specific Information in
Multilingual BERT [60.9051207862378]
multilingual BERT works remarkably well on cross-lingual transfer tasks.
Datasize and context window size are crucial factors to the transferability.
There is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT.
arXiv Detail & Related papers (2020-04-20T11:13:16Z) - Understanding Cross-Lingual Syntactic Transfer in Multilingual Recurrent
Neural Networks [3.9342247746757435]
It is now established that modern neural language models can be successfully trained on multiple languages simultaneously.
But what kind of knowledge is really shared among languages within these models?
In this paper we dissect different forms of cross-lingual transfer and look for its most determining factors.
We find that exposing our LMs to a related language does not always increase grammatical knowledge in the target language, and that optimal conditions for lexical-semantic transfer may not be optimal for syntactic transfer.
arXiv Detail & Related papers (2020-03-31T09:48:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.