Beyond English-Centric Bitexts for Better Multilingual Language
Representation Learning
- URL: http://arxiv.org/abs/2210.14867v1
- Date: Wed, 26 Oct 2022 17:16:52 GMT
- Title: Beyond English-Centric Bitexts for Better Multilingual Language
Representation Learning
- Authors: Barun Patra, Saksham Singhal, Shaohan Huang, Zewen Chi, Li Dong, Furu
Wei, Vishrav Chaudhary and Xia Song
- Abstract summary: We show that going beyond English-centric bitexts, coupled with a novel sampling strategy, substantially boosts performance across model sizes.
Our XY-LENT XL variant outperforms XLM-RXXL and exhibits competitive performance with mT5 XXL while being 5x and 6x smaller respectively.
- Score: 99.42850643947439
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we elaborate upon recipes for building multilingual
representation models that are not only competitive with existing
state-of-the-art models but are also more parameter efficient, thereby
promoting better adoption in resource-constrained scenarios and practical
applications. We show that going beyond English-centric bitexts, coupled with a
novel sampling strategy aimed at reducing under-utilization of training data,
substantially boosts performance across model sizes for both Electra and MLM
pre-training objectives. We introduce XY-LENT: X-Y bitext enhanced Language
ENcodings using Transformers which not only achieves state-of-the-art
performance over 5 cross-lingual tasks within all model size bands, is also
competitive across bands. Our XY-LENT XL variant outperforms XLM-RXXL and
exhibits competitive performance with mT5 XXL while being 5x and 6x smaller
respectively. We then show that our proposed method helps ameliorate the curse
of multilinguality, with the XY-LENT XL achieving 99.3% GLUE performance and
98.5% SQuAD 2.0 performance compared to a SoTA English only model in the same
size band. We then analyze our models performance on extremely low resource
languages and posit that scaling alone may not be sufficient for improving the
performance in this scenario
Related papers
- The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance.
Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes.
We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z) - Multilingual Sentence-T5: Scalable Sentence Encoders for Multilingual Applications [4.240899165468488]
We introduce Multilingual Sentence T5 (m-ST5) as a larger model of NLI-based multilingual sentence embedding.
By employing the low-rank adaptation (LoRA) technique, we have achieved a successful scaling of the model's size to 5.7 billion parameters.
It was particularly noteworthy that languages with fewer resources or those with less linguistic similarity to English benefited more from the parameter increase.
arXiv Detail & Related papers (2024-03-26T09:31:55Z) - Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models [110.10545153845051]
Cross-lingual Expert Language Models (X-ELM) is a process that specializes X-ELMs to different languages while remaining effective as a multilingual ensemble.
X-ELM provides additional benefits over performance improvements: new experts can be iteratively added, adapting X-ELM to new languages without catastrophic forgetting.
arXiv Detail & Related papers (2024-01-19T01:07:50Z) - EMS: Efficient and Effective Massively Multilingual Sentence Embedding Learning [38.928786416891924]
We introduce efficient and effective massively multilingual sentence embedding (EMS) using cross-lingual token-level reconstruction (XTR) and sentence-level contrastive learning as training objectives.
Compared with related studies, the proposed model can be efficiently trained using significantly fewer parallel sentences and GPU computation resources.
We release the codes for model training and the EMS pre-trained sentence embedding model, which supports 62 languages.
arXiv Detail & Related papers (2022-05-31T12:29:25Z) - PaLM: Scaling Language Modeling with Pathways [180.69584031908113]
We trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.
We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods.
We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks.
arXiv Detail & Related papers (2022-04-05T16:11:45Z) - Larger-Scale Transformers for Multilingual Masked Language Modeling [16.592883204398518]
Two new models dubbed XLM-R XL and XLM-R XXL outperform XLM-R by 1.8% and 2.4% average accuracy on XNLI.
Our model also outperforms the RoBERTa-Large model on several English tasks of the GLUE benchmark by 0.3% on average while handling 99 more languages.
arXiv Detail & Related papers (2021-05-02T23:15:02Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.