Bertinho: Galician BERT Representations
- URL: http://arxiv.org/abs/2103.13799v1
- Date: Thu, 25 Mar 2021 12:51:34 GMT
- Title: Bertinho: Galician BERT Representations
- Authors: David Vilares and Marcos Garcia and Carlos G\'omez-Rodr\'iguez
- Abstract summary: This paper presents a monolingual BERT model for Galician.
We release two models, built using 6 and 12 transformer layers, respectively.
We show that our models, especially the 12-layer one, outperform the results of mBERT in most tasks.
- Score: 14.341471404165349
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a monolingual BERT model for Galician. We follow the
recent trend that shows that it is feasible to build robust monolingual BERT
models even for relatively low-resource languages, while performing better than
the well-known official multilingual BERT (mBERT). More particularly, we
release two monolingual Galician BERT models, built using 6 and 12 transformer
layers, respectively; trained with limited resources (~45 million tokens on a
single GPU of 24GB). We then provide an exhaustive evaluation on a number of
tasks such as POS-tagging, dependency parsing and named entity recognition. For
this purpose, all these tasks are cast in a pure sequence labeling setup in
order to run BERT without the need to include any additional layers on top of
it (we only use an output classification layer to map the contextualized
representations into the predicted label). The experiments show that our
models, especially the 12-layer one, outperform the results of mBERT in most
tasks.
Related papers
- L3Cube-IndicSBERT: A simple approach for learning cross-lingual sentence
representations using multilingual BERT [0.7874708385247353]
The multilingual Sentence-BERT (SBERT) models map different languages to common representation space.
We propose a simple yet effective approach to convert vanilla multilingual BERT models into multilingual sentence BERT models using synthetic corpus.
We show that multilingual BERT models are inherent cross-lingual learners and this simple baseline fine-tuning approach yields exceptional cross-lingual properties.
arXiv Detail & Related papers (2023-04-22T15:45:40Z) - Evaluation of contextual embeddings on less-resourced languages [4.417922173735813]
This paper presents the first multilingual empirical comparison of two ELMo and several monolingual and multilingual BERT models using 14 tasks in nine languages.
In monolingual settings, monolingual BERT models generally dominate, with a few exceptions such as the dependency parsing task.
In cross-lingual settings, BERT models trained on only a few languages mostly do best, closely followed by massively multilingual BERT models.
arXiv Detail & Related papers (2021-07-22T12:32:27Z) - Looking for Clues of Language in Multilingual BERT to Improve
Cross-lingual Generalization [56.87201892585477]
Token embeddings in multilingual BERT (m-BERT) contain both language and semantic information.
We control the output languages of multilingual BERT by manipulating the token embeddings.
arXiv Detail & Related papers (2020-10-20T05:41:35Z) - It's not Greek to mBERT: Inducing Word-Level Translations from
Multilingual BERT [54.84185432755821]
multilingual BERT (mBERT) learns rich cross-lingual representations, that allow for transfer across languages.
We study the word-level translation information embedded in mBERT and present two simple methods that expose remarkable translation capabilities with no fine-tuning.
arXiv Detail & Related papers (2020-10-16T09:49:32Z) - Evaluating Multilingual BERT for Estonian [0.8057006406834467]
We evaluate four multilingual models -- multilingual BERT, multilingual distilled BERT, XLM and XLM-RoBERTa -- on several NLP tasks.
Our results show that multilingual BERT models can generalise well on different Estonian NLP tasks.
arXiv Detail & Related papers (2020-10-01T14:48:31Z) - ConvBERT: Improving BERT with Span-based Dynamic Convolution [144.25748617961082]
BERT heavily relies on the global self-attention block and thus suffers large memory footprint and computation cost.
We propose a novel span-based dynamic convolution to replace these self-attention heads to directly model local dependencies.
The novel convolution heads, together with the rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context learning.
arXiv Detail & Related papers (2020-08-06T07:43:19Z) - Identifying Necessary Elements for BERT's Multilinguality [4.822598110892846]
multilingual BERT (mBERT) yields high quality multilingual representations and enables effective zero-shot transfer.
We aim to identify architectural properties of BERT and linguistic properties of languages that are necessary for BERT to become multilingual.
arXiv Detail & Related papers (2020-05-01T14:27:14Z) - A Study of Cross-Lingual Ability and Language-specific Information in
Multilingual BERT [60.9051207862378]
multilingual BERT works remarkably well on cross-lingual transfer tasks.
Datasize and context window size are crucial factors to the transferability.
There is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT.
arXiv Detail & Related papers (2020-04-20T11:13:16Z) - What's so special about BERT's layers? A closer look at the NLP pipeline
in monolingual and multilingual models [18.155121103400333]
We probe a Dutch BERT-based model and the multilingual BERT model for Dutch NLP tasks.
Through a deeper analysis of part-of-speech tagging, we show that also within a given task, information is spread over different parts of the network.
arXiv Detail & Related papers (2020-04-14T13:41:48Z) - Incorporating BERT into Neural Machine Translation [251.54280200353674]
We propose a new algorithm named BERT-fused model, in which we first use BERT to extract representations for an input sequence.
We conduct experiments on supervised (including sentence-level and document-level translations), semi-supervised and unsupervised machine translation, and achieve state-of-the-art results on seven benchmark datasets.
arXiv Detail & Related papers (2020-02-17T08:13:36Z) - BERT's output layer recognizes all hidden layers? Some Intriguing
Phenomena and a simple way to boost BERT [53.63288887672302]
Bidirectional Representations from Transformers (BERT) have achieved tremendous success in many natural language processing (NLP) tasks.
We find that surprisingly the output layer of BERT can reconstruct the input sentence by directly taking each layer of BERT as input.
We propose a quite simple method to boost the performance of BERT.
arXiv Detail & Related papers (2020-01-25T13:35:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.