HerBERT: Efficiently Pretrained Transformer-based Language Model for
Polish
- URL: http://arxiv.org/abs/2105.01735v1
- Date: Tue, 4 May 2021 20:16:17 GMT
- Title: HerBERT: Efficiently Pretrained Transformer-based Language Model for
Polish
- Authors: Robert Mroczkowski, Piotr Rybak, Alina Wr\'oblewska, Ireneusz Gawlik
- Abstract summary: This paper presents the first ablation study focused on Polish, which, unlike the isolating English language, is a fusional language.
We design and thoroughly evaluate a pretraining procedure of transferring knowledge from multilingual to monolingual BERT-based models.
Based on the proposed procedure, a Polish BERT-based language model -- HerBERT -- is trained.
- Score: 4.473327661758546
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: BERT-based models are currently used for solving nearly all Natural Language
Processing (NLP) tasks and most often achieve state-of-the-art results.
Therefore, the NLP community conducts extensive research on understanding these
models, but above all on designing effective and efficient training procedures.
Several ablation studies investigating how to train BERT-like models have been
carried out, but the vast majority of them concerned only the English language.
A training procedure designed for English does not have to be universal and
applicable to other especially typologically different languages. Therefore,
this paper presents the first ablation study focused on Polish, which, unlike
the isolating English language, is a fusional language. We design and
thoroughly evaluate a pretraining procedure of transferring knowledge from
multilingual to monolingual BERT-based models. In addition to multilingual
model initialization, other factors that possibly influence pretraining are
also explored, i.e. training objective, corpus size, BPE-Dropout, and
pretraining length. Based on the proposed procedure, a Polish BERT-based
language model -- HerBERT -- is trained. This model achieves state-of-the-art
results on multiple downstream tasks.
Related papers
- Comparison of Pre-trained Language Models for Turkish Address Parsing [0.0]
We focus on Turkish maps data and thoroughly evaluate both multilingual and Turkish based BERT, DistilBERT, ELECTRA and RoBERTa.
We also propose a MultiLayer Perceptron (MLP) for fine-tuning BERT in addition to the standard approach of one-layer fine-tuning.
arXiv Detail & Related papers (2023-06-24T12:09:43Z) - LERT: A Linguistically-motivated Pre-trained Language Model [67.65651497173998]
We propose LERT, a pre-trained language model that is trained on three types of linguistic features along with the original pre-training task.
We carried out extensive experiments on ten Chinese NLU tasks, and the experimental results show that LERT could bring significant improvements.
arXiv Detail & Related papers (2022-11-10T05:09:16Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Training dataset and dictionary sizes matter in BERT models: the case of
Baltic languages [0.0]
We train a trilingual LitLat BERT-like model for Lithuanian, Latvian, and English, and a monolingual Est-RoBERTa model for Estonian.
We evaluate their performance on four downstream tasks: named entity recognition, dependency parsing, part-of-speech tagging, and word analogy.
arXiv Detail & Related papers (2021-12-20T14:26:40Z) - Recent Advances in Natural Language Processing via Large Pre-Trained
Language Models: A Survey [67.82942975834924]
Large, pre-trained language models such as BERT have drastically changed the Natural Language Processing (NLP) field.
We present a survey of recent work that uses these large language models to solve NLP tasks via pre-training then fine-tuning, prompting, or text generation approaches.
arXiv Detail & Related papers (2021-11-01T20:08:05Z) - bert2BERT: Towards Reusable Pretrained Language Models [51.078081486422896]
We propose bert2BERT, which can effectively transfer the knowledge of an existing smaller pre-trained model to a large model.
bert2BERT saves about 45% and 47% computational cost of pre-training BERT_BASE and GPT_BASE by reusing the models of almost their half sizes.
arXiv Detail & Related papers (2021-10-14T04:05:25Z) - EstBERT: A Pretrained Language-Specific BERT for Estonian [0.3674863913115431]
This paper presents EstBERT, a large pretrained transformer-based language-specific BERT model for Estonian.
Recent work has evaluated multilingual BERT models on Estonian tasks and found them to outperform the baselines.
We show that the models based on EstBERT outperform multilingual BERT models on five tasks out of six.
arXiv Detail & Related papers (2020-11-09T21:33:53Z) - Pretrained Language Model Embryology: The Birth of ALBERT [68.5801642674541]
We investigate the developmental process from a set of randomly parameters to a totipotent language model.
Our results show that ALBERT learns to reconstruct and predict tokens of different parts of speech (POS) in different learning speeds during pretraining.
These findings suggest that knowledge of a pretrained model varies during pretraining, and having more pretrain steps does not necessarily provide a model with more comprehensive knowledge.
arXiv Detail & Related papers (2020-10-06T05:15:39Z) - Domain-Specific Language Model Pretraining for Biomedical Natural
Language Processing [73.37262264915739]
We show that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains.
Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks.
arXiv Detail & Related papers (2020-07-31T00:04:15Z) - WikiBERT models: deep transfer learning for many languages [1.3455090151301572]
We introduce a simple, fully automated pipeline for creating languagespecific BERT models from Wikipedia data.
We assess the merits of these models using the state-of-the-art UDify on Universal Dependencies data.
arXiv Detail & Related papers (2020-06-02T11:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.