EstBERT: A Pretrained Language-Specific BERT for Estonian
- URL: http://arxiv.org/abs/2011.04784v3
- Date: Wed, 28 Apr 2021 09:44:08 GMT
- Title: EstBERT: A Pretrained Language-Specific BERT for Estonian
- Authors: Hasan Tanvir, Claudia Kittask, Sandra Eiche, Kairit Sirts
- Abstract summary: This paper presents EstBERT, a large pretrained transformer-based language-specific BERT model for Estonian.
Recent work has evaluated multilingual BERT models on Estonian tasks and found them to outperform the baselines.
We show that the models based on EstBERT outperform multilingual BERT models on five tasks out of six.
- Score: 0.3674863913115431
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents EstBERT, a large pretrained transformer-based
language-specific BERT model for Estonian. Recent work has evaluated
multilingual BERT models on Estonian tasks and found them to outperform the
baselines. Still, based on existing studies on other languages, a
language-specific BERT model is expected to improve over the multilingual ones.
We first describe the EstBERT pretraining process and then present the results
of the models based on finetuned EstBERT for multiple NLP tasks, including POS
and morphological tagging, named entity recognition and text classification.
The evaluation results show that the models based on EstBERT outperform
multilingual BERT models on five tasks out of six, providing further evidence
towards a view that training language-specific BERT models are still useful,
even when multilingual models are available.
Related papers
- LERT: A Linguistically-motivated Pre-trained Language Model [67.65651497173998]
We propose LERT, a pre-trained language model that is trained on three types of linguistic features along with the original pre-training task.
We carried out extensive experiments on ten Chinese NLU tasks, and the experimental results show that LERT could bring significant improvements.
arXiv Detail & Related papers (2022-11-10T05:09:16Z) - Training dataset and dictionary sizes matter in BERT models: the case of
Baltic languages [0.0]
We train a trilingual LitLat BERT-like model for Lithuanian, Latvian, and English, and a monolingual Est-RoBERTa model for Estonian.
We evaluate their performance on four downstream tasks: named entity recognition, dependency parsing, part-of-speech tagging, and word analogy.
arXiv Detail & Related papers (2021-12-20T14:26:40Z) - bert2BERT: Towards Reusable Pretrained Language Models [51.078081486422896]
We propose bert2BERT, which can effectively transfer the knowledge of an existing smaller pre-trained model to a large model.
bert2BERT saves about 45% and 47% computational cost of pre-training BERT_BASE and GPT_BASE by reusing the models of almost their half sizes.
arXiv Detail & Related papers (2021-10-14T04:05:25Z) - Language Models are Few-shot Multilingual Learners [66.11011385895195]
We evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages.
We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones.
arXiv Detail & Related papers (2021-09-16T03:08:22Z) - HerBERT: Efficiently Pretrained Transformer-based Language Model for
Polish [4.473327661758546]
This paper presents the first ablation study focused on Polish, which, unlike the isolating English language, is a fusional language.
We design and thoroughly evaluate a pretraining procedure of transferring knowledge from multilingual to monolingual BERT-based models.
Based on the proposed procedure, a Polish BERT-based language model -- HerBERT -- is trained.
arXiv Detail & Related papers (2021-05-04T20:16:17Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - Evaluating Multilingual BERT for Estonian [0.8057006406834467]
We evaluate four multilingual models -- multilingual BERT, multilingual distilled BERT, XLM and XLM-RoBERTa -- on several NLP tasks.
Our results show that multilingual BERT models can generalise well on different Estonian NLP tasks.
arXiv Detail & Related papers (2020-10-01T14:48:31Z) - Domain-Specific Language Model Pretraining for Biomedical Natural
Language Processing [73.37262264915739]
We show that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains.
Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks.
arXiv Detail & Related papers (2020-07-31T00:04:15Z) - ParsBERT: Transformer-based Model for Persian Language Understanding [0.7646713951724012]
This paper proposes a monolingual BERT for the Persian language (ParsBERT)
It shows its state-of-the-art performance compared to other architectures and multilingual models.
ParsBERT obtains higher scores in all datasets, including existing ones as well as composed ones.
arXiv Detail & Related papers (2020-05-26T05:05:32Z) - RobBERT: a Dutch RoBERTa-based Language Model [9.797319790710711]
We use RoBERTa to train a Dutch language model called RobBERT.
We measure its performance on various tasks as well as the importance of the fine-tuning dataset size.
RobBERT improves state-of-the-art results for various tasks, and especially significantly outperforms other models when dealing with smaller datasets.
arXiv Detail & Related papers (2020-01-17T13:25:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.