How Do Multilingual Encoders Learn Cross-lingual Representation?
- URL: http://arxiv.org/abs/2207.05737v1
- Date: Tue, 12 Jul 2022 17:57:05 GMT
- Title: How Do Multilingual Encoders Learn Cross-lingual Representation?
- Authors: Shijie Wu
- Abstract summary: Cross-lingual transfer benefits languages with little to no training data by transferring from other languages.
This thesis first shows such surprising cross-lingual effectiveness compared against prior art on various tasks.
We also look at how to inject different cross-lingual signals into multilingual encoders, and the optimization behavior of cross-lingual transfer with these models.
- Score: 8.409283426564977
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: NLP systems typically require support for more than one language. As
different languages have different amounts of supervision, cross-lingual
transfer benefits languages with little to no training data by transferring
from other languages. From an engineering perspective, multilingual NLP
benefits development and maintenance by serving multiple languages with a
single system. Both cross-lingual transfer and multilingual NLP rely on
cross-lingual representations serving as the foundation. As BERT revolutionized
representation learning and NLP, it also revolutionized cross-lingual
representations and cross-lingual transfer. Multilingual BERT was released as a
replacement for single-language BERT, trained with Wikipedia data in 104
languages.
Surprisingly, without any explicit cross-lingual signal, multilingual BERT
learns cross-lingual representations in addition to representations for
individual languages. This thesis first shows such surprising cross-lingual
effectiveness compared against prior art on various tasks. Naturally, it raises
a set of questions, most notably how do these multilingual encoders learn
cross-lingual representations. In exploring these questions, this thesis will
analyze the behavior of multilingual models in a variety of settings on high
and low resource languages. We also look at how to inject different
cross-lingual signals into multilingual encoders, and the optimization behavior
of cross-lingual transfer with these models. Together, they provide a better
understanding of multilingual encoders on cross-lingual transfer. Our findings
will lead us to suggested improvements to multilingual encoders and
cross-lingual transfer.
Related papers
- mOthello: When Do Cross-Lingual Representation Alignment and Cross-Lingual Transfer Emerge in Multilingual Models? [15.90185747024602]
We propose a synthetic task, Multilingual Othello (mOthello), as a testbed to delve into two questions.
We find that models trained with naive multilingual pretraining fail to learn a language-neutral representation across all input languages.
We propose a novel approach - multilingual pretraining with unified output space - that both induces the learning of language-neutral representation and facilitates cross-lingual transfer.
arXiv Detail & Related papers (2024-04-18T18:03:08Z) - Bitext Mining Using Distilled Sentence Representations for Low-Resource
Languages [12.00637655338665]
We study very low-resource languages and handle 50 African languages, many of which are not covered by any other model.
We train sentence encoders, mine bitexts, and validate the bitexts by training NMT systems.
For these languages, we train sentence encoders, mine bitexts, and validate the bitexts by training NMT systems.
arXiv Detail & Related papers (2022-05-25T10:53:24Z) - Breaking Down Multilingual Machine Translation [74.24795388967907]
We show that multilingual training is beneficial to encoders in general, while it only benefits decoders for low-resource languages (LRLs)
Our many-to-one models for high-resource languages and one-to-many models for LRLs outperform the best results reported by Aharoni et al.
arXiv Detail & Related papers (2021-10-15T14:57:12Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Syntax-augmented Multilingual BERT for Cross-lingual Transfer [37.99210035238424]
This work shows that explicitly providing language syntax and training mBERT helps cross-lingual transfer.
Experiment results show that syntax-augmented mBERT improves cross-lingual transfer on popular benchmarks.
arXiv Detail & Related papers (2021-06-03T21:12:50Z) - First Align, then Predict: Understanding the Cross-Lingual Ability of
Multilingual BERT [2.2931318723689276]
Cross-lingual transfer emerges from fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning.
We show that multilingual BERT can be viewed as the stacking of two sub-networks: a multilingual encoder followed by a task-specific language-agnostic predictor.
While the encoder is crucial for cross-lingual transfer and remains mostly unchanged during fine-tuning, the task predictor has little importance on the transfer and can be red during fine-tuning.
arXiv Detail & Related papers (2021-01-26T22:12:38Z) - What makes multilingual BERT multilingual? [60.9051207862378]
In this work, we provide an in-depth experimental study to supplement the existing literature of cross-lingual ability.
We compare the cross-lingual ability of non-contextualized and contextualized representation model with the same data.
We found that datasize and context window size are crucial factors to the transferability.
arXiv Detail & Related papers (2020-10-20T05:41:56Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z) - A Study of Cross-Lingual Ability and Language-specific Information in
Multilingual BERT [60.9051207862378]
multilingual BERT works remarkably well on cross-lingual transfer tasks.
Datasize and context window size are crucial factors to the transferability.
There is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT.
arXiv Detail & Related papers (2020-04-20T11:13:16Z) - Learning to Scale Multilingual Representations for Vision-Language Tasks [51.27839182889422]
The effectiveness of SMALR is demonstrated with ten diverse languages, over twice the number supported in vision-language tasks to date.
We evaluate on multilingual image-sentence retrieval and outperform prior work by 3-4% with less than 1/5th the training parameters compared to other word embedding methods.
arXiv Detail & Related papers (2020-04-09T01:03:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.