Related papers: MonoByte: A Pool of Monolingual Byte-level Language Models

MonoByte: A Pool of Monolingual Byte-level Language Models

URL: http://arxiv.org/abs/2209.11035v1
Date: Thu, 22 Sep 2022 14:32:48 GMT
Title: MonoByte: A Pool of Monolingual Byte-level Language Models
Authors: Hugo Abonizio, Leandro Rodrigues de Souza, Roberto Lotufo, Rodrigo Nogueira
Abstract summary: We release 10 monolingual byte-level models rigorously pretrained under the same configuration. Because they are tokenizer-free, the problem of unseen token embeddings is eliminated. Experiments on QA and NLI tasks show that our monolingual models achieve competitive performance to the multilingual one.
Score: 4.491765479948667
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The zero-shot cross-lingual ability of models pretrained on multilingual and even monolingual corpora has spurred many hypotheses to explain this intriguing empirical result. However, due to the costs of pretraining, most research uses public models whose pretraining methodology, such as the choice of tokenization, corpus size, and computational budget, might differ drastically. When researchers pretrain their own models, they often do so under a constrained budget, and the resulting models might underperform significantly compared to SOTA models. These experimental differences led to various inconsistent conclusions about the nature of the cross-lingual ability of these models. To help further research on the topic, we released 10 monolingual byte-level models rigorously pretrained under the same configuration with a large compute budget (equivalent to 420 days on a V100) and corpora that are 4 times larger than the original BERT's. Because they are tokenizer-free, the problem of unseen token embeddings is eliminated, thus allowing researchers to try a wider range of cross-lingual experiments in languages with different scripts. Additionally, we release two models pretrained on non-natural language texts that can be used in sanity-check experiments. Experiments on QA and NLI tasks show that our monolingual models achieve competitive performance to the multilingual one, and hence can be served to strengthen our understanding of cross-lingual transferability in language models.

Related papers

Scaling Laws for Multilingual Language Models [41.6318470003173]
A primary challenge in studying multilingual scaling is the difficulty of analyzing individual language performance due to cross-lingual transfer. We introduce and validate a hypothesis that the test cross-entropy loss for each language family is determined solely by its own sampling ratio. We derive a power-law relationship that links performance with dataset size, model size and sampling ratios.
arXiv Detail & Related papers (2024-10-15T20:29:38Z)
Investigating Language-Specific Calibration For Pruning Multilingual Large Language Models [11.421452042888523]
We compare different calibration languages for pruning multilingual models across diverse languages, tasks, models, and SotA pruning techniques. Our results offer practical suggestions, for example, calibrating in the target language can efficiently retain the language modeling capability but does not necessarily benefit downstream tasks.
arXiv Detail & Related papers (2024-08-26T16:29:13Z)
Do Multilingual Large Language Models Mitigate Stereotype Bias? [9.31741279000585]
This study systematically trains six LLMs of identical size and architecture in English, German, French, Italian, and Spanish. We observe that multilingual models achieve not only lower bias but also superior prediction accuracy when compared to monolingual models.
arXiv Detail & Related papers (2024-07-08T08:46:50Z)
Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability [11.000499414131324]
We conduct experiments across four types of reasoning tasks. We find that the multilingual pretrained model does not always outperform an English-centric model. English appears to be a less suitable source language, and the choice of source language becomes less important when the English-centric model scales up.
arXiv Detail & Related papers (2023-06-11T14:03:09Z)
Causal Analysis of Syntactic Agreement Neurons in Multilingual Language Models [28.036233760742125]
We causally probe multilingual language models (XGLM and multilingual BERT) across various languages. We find significant neuron overlap across languages in autoregressive multilingual language models, but not masked language models.
arXiv Detail & Related papers (2022-10-25T20:43:36Z)
Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process. We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks. Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z)
Probing Structured Pruning on Multilingual Pre-trained Models: Settings, Algorithms, and Efficiency [62.0887259003594]
This work investigates three aspects of structured pruning on multilingual pre-trained language models: settings, algorithms, and efficiency. Experiments on nine downstream tasks show several counter-intuitive phenomena. We present Dynamic Sparsification, a simple approach that allows training the model once and adapting to different model sizes at inference.
arXiv Detail & Related papers (2022-04-06T06:29:52Z)
Language Models are Few-shot Multilingual Learners [66.11011385895195]
We evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages. We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones.
arXiv Detail & Related papers (2021-09-16T03:08:22Z)
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models [96.32118305166412]
We study a set of nine typologically diverse languages with readily available pretrained monolingual models on a set of five diverse monolingual downstream tasks. We find that languages which are adequately represented in the multilingual model's vocabulary exhibit negligible performance decreases over their monolingual counterparts.
arXiv Detail & Related papers (2020-12-31T14:11:00Z)
Multilingual Translation with Extensible Multilingual Pretraining and Finetuning [77.33262578776291]
Previous work has demonstrated that machine translation systems can be created by finetuning on bitext. We show that multilingual translation models can be created through multilingual finetuning. We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance.
arXiv Detail & Related papers (2020-08-02T05:36:55Z)
Structure-Level Knowledge Distillation For Multilingual Sequence Labeling [73.40368222437912]
We propose to reduce the gap between monolingual models and the unified multilingual model by distilling the structural knowledge of several monolingual models to the unified multilingual model (student) Our experiments on 4 multilingual tasks with 25 datasets show that our approaches outperform several strong baselines and have stronger zero-shot generalizability than both the baseline model and teacher models.
arXiv Detail & Related papers (2020-04-08T07:14:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.