Finding Universal Grammatical Relations in Multilingual BERT
- URL: http://arxiv.org/abs/2005.04511v2
- Date: Wed, 20 May 2020 08:32:18 GMT
- Title: Finding Universal Grammatical Relations in Multilingual BERT
- Authors: Ethan A. Chi, John Hewitt, Christopher D. Manning
- Abstract summary: We show that subspaces of mBERT representations recover syntactic tree distances in languages other than English.
We present an unsupervised analysis method that provides evidence mBERT learns representations of syntactic dependency labels.
- Score: 47.74015366712623
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has found evidence that Multilingual BERT (mBERT), a
transformer-based multilingual masked language model, is capable of zero-shot
cross-lingual transfer, suggesting that some aspects of its representations are
shared cross-lingually. To better understand this overlap, we extend recent
work on finding syntactic trees in neural networks' internal representations to
the multilingual setting. We show that subspaces of mBERT representations
recover syntactic tree distances in languages other than English, and that
these subspaces are approximately shared across languages. Motivated by these
results, we present an unsupervised analysis method that provides evidence
mBERT learns representations of syntactic dependency labels, in the form of
clusters which largely agree with the Universal Dependencies taxonomy. This
evidence suggests that even without explicit supervision, multilingual masked
language models learn certain linguistic universals.
Related papers
- Discovering Low-rank Subspaces for Language-agnostic Multilingual
Representations [38.56175462620892]
Large pretrained multilingual language models (ML-LMs) have shown remarkable capabilities of zero-shot cross-lingual transfer.
We present a novel view of projecting away language-specific factors from a multilingual embedding space.
We show that applying our method consistently leads to improvements over commonly used ML-LMs.
arXiv Detail & Related papers (2024-01-11T09:54:11Z) - Discovering Language-neutral Sub-networks in Multilingual Language
Models [15.94622051535847]
Language neutrality of multilingual models is a function of the overlap between language-encoding sub-networks of these models.
Using mBERT as a foundation, we employ the lottery ticket hypothesis to discover sub-networks that are individually optimized for various languages and tasks.
We conclude that mBERT is comprised of a language-neutral sub-network shared among many languages, along with multiple ancillary language-specific sub-networks.
arXiv Detail & Related papers (2022-05-25T11:35:41Z) - Cross-Lingual Ability of Multilingual Masked Language Models: A Study of
Language Structure [54.01613740115601]
We study three language properties: constituent order, composition and word co-occurrence.
Our main conclusion is that the contribution of constituent order and word co-occurrence is limited, while the composition is more crucial to the success of cross-linguistic transfer.
arXiv Detail & Related papers (2022-03-16T07:09:35Z) - An Isotropy Analysis in the Multilingual BERT Embedding Space [18.490856440975996]
We investigate the representation degeneration problem in multilingual contextual word representations (CWRs) of BERT.
Our results show that increasing the isotropy of multilingual embedding space can significantly improve its representation power and performance.
Our analysis indicates that although the degenerated directions vary in different languages, they encode similar linguistic knowledge, suggesting a shared linguistic space among languages.
arXiv Detail & Related papers (2021-10-09T08:29:49Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - It's not Greek to mBERT: Inducing Word-Level Translations from
Multilingual BERT [54.84185432755821]
multilingual BERT (mBERT) learns rich cross-lingual representations, that allow for transfer across languages.
We study the word-level translation information embedded in mBERT and present two simple methods that expose remarkable translation capabilities with no fine-tuning.
arXiv Detail & Related papers (2020-10-16T09:49:32Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - A Study of Cross-Lingual Ability and Language-specific Information in
Multilingual BERT [60.9051207862378]
multilingual BERT works remarkably well on cross-lingual transfer tasks.
Datasize and context window size are crucial factors to the transferability.
There is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT.
arXiv Detail & Related papers (2020-04-20T11:13:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.