Identifying Necessary Elements for BERT's Multilinguality
- URL: http://arxiv.org/abs/2005.00396v3
- Date: Mon, 8 Feb 2021 14:51:56 GMT
- Title: Identifying Necessary Elements for BERT's Multilinguality
- Authors: Philipp Dufter, Hinrich Sch\"utze
- Abstract summary: multilingual BERT (mBERT) yields high quality multilingual representations and enables effective zero-shot transfer.
We aim to identify architectural properties of BERT and linguistic properties of languages that are necessary for BERT to become multilingual.
- Score: 4.822598110892846
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It has been shown that multilingual BERT (mBERT) yields high quality
multilingual representations and enables effective zero-shot transfer. This is
surprising given that mBERT does not use any crosslingual signal during
training. While recent literature has studied this phenomenon, the reasons for
the multilinguality are still somewhat obscure. We aim to identify
architectural properties of BERT and linguistic properties of languages that
are necessary for BERT to become multilingual. To allow for fast
experimentation we propose an efficient setup with small BERT models trained on
a mix of synthetic and natural data. Overall, we identify four architectural
and two linguistic elements that influence multilinguality. Based on our
insights, we experiment with a multilingual pretraining setup that modifies the
masking strategy using VecMap, i.e., unsupervised embedding alignment.
Experiments on XNLI with three languages indicate that our findings transfer
from our small setup to larger scale settings.
Related papers
- L3Cube-IndicSBERT: A simple approach for learning cross-lingual sentence
representations using multilingual BERT [0.7874708385247353]
The multilingual Sentence-BERT (SBERT) models map different languages to common representation space.
We propose a simple yet effective approach to convert vanilla multilingual BERT models into multilingual sentence BERT models using synthetic corpus.
We show that multilingual BERT models are inherent cross-lingual learners and this simple baseline fine-tuning approach yields exceptional cross-lingual properties.
arXiv Detail & Related papers (2023-04-22T15:45:40Z) - Are Multilingual Models Effective in Code-Switching? [57.78477547424949]
We study the effectiveness of multilingual language models to understand their capability and adaptability to the mixed-language setting.
Our findings suggest that pre-trained multilingual models do not necessarily guarantee high-quality representations on code-switching.
arXiv Detail & Related papers (2021-03-24T16:20:02Z) - To What Degree Can Language Borders Be Blurred In BERT-based
Multilingual Spoken Language Understanding? [7.245261469258502]
We show that although a BERT-based multilingual Spoken Language Understanding (SLU) model works substantially well even on distant language groups, there is still a gap to the ideal multilingual performance.
We propose a novel BERT-based adversarial model architecture to learn language-shared and language-specific representations for multilingual SLU.
arXiv Detail & Related papers (2020-11-10T09:59:24Z) - What makes multilingual BERT multilingual? [60.9051207862378]
In this work, we provide an in-depth experimental study to supplement the existing literature of cross-lingual ability.
We compare the cross-lingual ability of non-contextualized and contextualized representation model with the same data.
We found that datasize and context window size are crucial factors to the transferability.
arXiv Detail & Related papers (2020-10-20T05:41:56Z) - Looking for Clues of Language in Multilingual BERT to Improve
Cross-lingual Generalization [56.87201892585477]
Token embeddings in multilingual BERT (m-BERT) contain both language and semantic information.
We control the output languages of multilingual BERT by manipulating the token embeddings.
arXiv Detail & Related papers (2020-10-20T05:41:35Z) - It's not Greek to mBERT: Inducing Word-Level Translations from
Multilingual BERT [54.84185432755821]
multilingual BERT (mBERT) learns rich cross-lingual representations, that allow for transfer across languages.
We study the word-level translation information embedded in mBERT and present two simple methods that expose remarkable translation capabilities with no fine-tuning.
arXiv Detail & Related papers (2020-10-16T09:49:32Z) - Finding Universal Grammatical Relations in Multilingual BERT [47.74015366712623]
We show that subspaces of mBERT representations recover syntactic tree distances in languages other than English.
We present an unsupervised analysis method that provides evidence mBERT learns representations of syntactic dependency labels.
arXiv Detail & Related papers (2020-05-09T20:46:02Z) - A Study of Cross-Lingual Ability and Language-specific Information in
Multilingual BERT [60.9051207862378]
multilingual BERT works remarkably well on cross-lingual transfer tasks.
Datasize and context window size are crucial factors to the transferability.
There is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT.
arXiv Detail & Related papers (2020-04-20T11:13:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.