Training Multilingual Pre-trained Language Model with Byte-level
Subwords
- URL: http://arxiv.org/abs/2101.09469v1
- Date: Sat, 23 Jan 2021 10:01:28 GMT
- Title: Training Multilingual Pre-trained Language Model with Byte-level
Subwords
- Authors: Junqiu Wei, Qun Liu, Yinpeng Guo, Xin Jiang
- Abstract summary: We present our practices on training multilingual pre-trained language models with BBPE: Byte-Level BPE (i.e., Byte Pair.
In the experiment, we adopted the architecture of NEZHA as the underlying pre-trained language model and the results show that NEZHA trained with byte-level subwords consistently.
We release the source code of our byte-level vocabulary building tools and the multilingual pre-trained language models.
- Score: 41.52056437015399
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The pre-trained language models have achieved great successes in various
natural language understanding (NLU) tasks due to its capacity to capture the
deep contextualized information in text by pre-training on large-scale corpora.
One of the fundamental components in pre-trained language models is the
vocabulary, especially for training multilingual models on many different
languages. In the technical report, we present our practices on training
multilingual pre-trained language models with BBPE: Byte-Level BPE (i.e., Byte
Pair Encoding). In the experiment, we adopted the architecture of NEZHA as the
underlying pre-trained language model and the results show that NEZHA trained
with byte-level subwords consistently outperforms Google multilingual BERT and
vanilla NEZHA by a notable margin in several multilingual NLU tasks. We release
the source code of our byte-level vocabulary building tools and the
multilingual pre-trained language models.
Related papers
- Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - LERT: A Linguistically-motivated Pre-trained Language Model [67.65651497173998]
We propose LERT, a pre-trained language model that is trained on three types of linguistic features along with the original pre-training task.
We carried out extensive experiments on ten Chinese NLU tasks, and the experimental results show that LERT could bring significant improvements.
arXiv Detail & Related papers (2022-11-10T05:09:16Z) - Generalizing Multimodal Pre-training into Multilingual via Language
Acquisition [54.69707237195554]
English-based Vision-Language Pre-training has achieved great success in various downstream tasks.
Some efforts have been taken to generalize this success to non-English languages through Multilingual Vision-Language Pre-training.
We propose a textbfMultitextbfLingual textbfAcquisition (MLA) framework that can easily generalize a monolingual Vision-Language Pre-training model into multilingual.
arXiv Detail & Related papers (2022-05-29T08:53:22Z) - Pre-training Data Quality and Quantity for a Low-Resource Language: New
Corpus and BERT Models for Maltese [4.4681678689625715]
We analyse the effect of pre-training with monolingual data for a low-resource language.
We present a newly created corpus for Maltese, and determine the effect that the pre-training data size and domain have on the downstream performance.
We compare two models on the new corpus: a monolingual BERT model trained from scratch (BERTu), and a further pre-trained multilingual BERT (mBERTu)
arXiv Detail & Related papers (2022-05-21T06:44:59Z) - HerBERT: Efficiently Pretrained Transformer-based Language Model for
Polish [4.473327661758546]
This paper presents the first ablation study focused on Polish, which, unlike the isolating English language, is a fusional language.
We design and thoroughly evaluate a pretraining procedure of transferring knowledge from multilingual to monolingual BERT-based models.
Based on the proposed procedure, a Polish BERT-based language model -- HerBERT -- is trained.
arXiv Detail & Related papers (2021-05-04T20:16:17Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - Towards Fully Bilingual Deep Language Modeling [1.3455090151301572]
We consider whether it is possible to pre-train a bilingual model for two remotely related languages without compromising performance at either language.
We create a Finnish-English bilingual BERT model and evaluate its performance on datasets used to evaluate the corresponding monolingual models.
Our bilingual model performs on par with Google's original English BERT on GLUE and nearly matches the performance of monolingual Finnish BERT on a range of Finnish NLP tasks.
arXiv Detail & Related papers (2020-10-22T12:22:50Z) - Multilingual Translation with Extensible Multilingual Pretraining and
Finetuning [77.33262578776291]
Previous work has demonstrated that machine translation systems can be created by finetuning on bitext.
We show that multilingual translation models can be created through multilingual finetuning.
We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance.
arXiv Detail & Related papers (2020-08-02T05:36:55Z) - WikiBERT models: deep transfer learning for many languages [1.3455090151301572]
We introduce a simple, fully automated pipeline for creating languagespecific BERT models from Wikipedia data.
We assess the merits of these models using the state-of-the-art UDify on Universal Dependencies data.
arXiv Detail & Related papers (2020-06-02T11:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.