ur-iw-hnt at GermEval 2021: An Ensembling Strategy with Multiple BERT
Models
- URL: http://arxiv.org/abs/2110.02042v1
- Date: Tue, 5 Oct 2021 13:48:20 GMT
- Title: ur-iw-hnt at GermEval 2021: An Ensembling Strategy with Multiple BERT
Models
- Authors: Hoai Nam Tran and Udo Kruschwitz
- Abstract summary: We submitted three runs using an ensembling strategy by majority (hard) voting with multiple different BERT models.
All ensemble models outperform single models, while BERTweet is the winner of all individual models in every subtask.
Twitter-based models perform better than GermanBERT models, and multilingual models perform worse but by a small margin.
- Score: 5.952826555378035
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes our approach (ur-iw-hnt) for the Shared Task of
GermEval2021 to identify toxic, engaging, and fact-claiming comments. We
submitted three runs using an ensembling strategy by majority (hard) voting
with multiple different BERT models of three different types: German-based,
Twitter-based, and multilingual models. All ensemble models outperform single
models, while BERTweet is the winner of all individual models in every subtask.
Twitter-based models perform better than GermanBERT models, and multilingual
models perform worse but by a small margin.
Related papers
- NeoDictaBERT: Pushing the Frontier of BERT models for Hebrew [5.9797303102481765]
We introduce NeoDictaBERT and NeoDictaBERT-bilingual: BERT-style models trained using the same architecture as NeoBERT, with a dedicated focus on Hebrew texts.<n>We describe the training process and report results across various benchmarks. We release the models to the community as part of our goal to advance research and development in Hebrew NLP.
arXiv Detail & Related papers (2025-10-23T09:34:53Z) - mmBERT: A Modern Multilingual Encoder with Annealed Language Learning [57.58071656545661]
mmBERT is an encoder-only language model pretrained on 3T tokens of multilingual text.<n>We add over 1700 low-resource languages to the data mix only during the decay phase.<n>We show that mmBERT significantly outperforms the previous generation of models on classification and retrieval tasks.
arXiv Detail & Related papers (2025-09-08T17:08:42Z) - MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z) - Evaluating Contextualized Language Models for Hungarian [0.0]
We compare huBERT, a Hungarian model against 4 multilingual models including the multilingual BERT model.
We find that huBERT works better than the other models, often by a large margin, particularly near the global optimum.
arXiv Detail & Related papers (2021-02-22T09:29:01Z) - GottBERT: a pure German Language Model [0.0]
No German single language RoBERTa model is yet published, which we introduce in this work (GottBERT)
In an evaluation we compare its performance on the two Named Entity Recognition (NER) tasks Conll 2003 and GermEval 2014 as well as on the text classification tasks GermEval 2018 (fine and coarse) and GNAD with existing German single language BERT models and two multilingual ones.
GottBERT was successfully pre-trained on a 256 core TPU pod using the RoBERTa BASE architecture.
arXiv Detail & Related papers (2020-12-03T17:45:03Z) - Evaluating Multilingual BERT for Estonian [0.8057006406834467]
We evaluate four multilingual models -- multilingual BERT, multilingual distilled BERT, XLM and XLM-RoBERTa -- on several NLP tasks.
Our results show that multilingual BERT models can generalise well on different Estonian NLP tasks.
arXiv Detail & Related papers (2020-10-01T14:48:31Z) - ConvBERT: Improving BERT with Span-based Dynamic Convolution [144.25748617961082]
BERT heavily relies on the global self-attention block and thus suffers large memory footprint and computation cost.
We propose a novel span-based dynamic convolution to replace these self-attention heads to directly model local dependencies.
The novel convolution heads, together with the rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context learning.
arXiv Detail & Related papers (2020-08-06T07:43:19Z) - WikiBERT models: deep transfer learning for many languages [1.3455090151301572]
We introduce a simple, fully automated pipeline for creating languagespecific BERT models from Wikipedia data.
We assess the merits of these models using the state-of-the-art UDify on Universal Dependencies data.
arXiv Detail & Related papers (2020-06-02T11:57:53Z) - lamBERT: Language and Action Learning Using Multimodal BERT [0.1942428068361014]
This study proposes the language and action learning using multimodal BERT (lamBERT) model.
Experiment is conducted in a grid environment that requires language understanding for the agent to act properly.
The lamBERT model obtained higher rewards in multitask settings and transfer settings when compared to other models.
arXiv Detail & Related papers (2020-04-15T13:54:55Z) - Structure-Level Knowledge Distillation For Multilingual Sequence
Labeling [73.40368222437912]
We propose to reduce the gap between monolingual models and the unified multilingual model by distilling the structural knowledge of several monolingual models to the unified multilingual model (student)
Our experiments on 4 multilingual tasks with 25 datasets show that our approaches outperform several strong baselines and have stronger zero-shot generalizability than both the baseline model and teacher models.
arXiv Detail & Related papers (2020-04-08T07:14:01Z) - Exploring Versatile Generative Language Model Via Parameter-Efficient
Transfer Learning [70.81910984985683]
We propose an effective way to fine-tune multiple down-stream generation tasks simultaneously using a single, large pre-trained model.
The experiments on five diverse language generation tasks show that by just using an additional 2-3% parameters for each task, our model can maintain or even improve the performance of fine-tuning the whole model.
arXiv Detail & Related papers (2020-04-08T06:18:44Z) - InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining [76.32065400614162]
We propose a novel model, namely InterBERT (BERT for Interaction), which is the first model of our series of multimodal pretraining methods M6.
The model owns strong capability of modeling interaction between the information flows of different modalities.
We propose a large-scale dataset for multi-modal pretraining in Chinese, and we develop the Chinese InterBERT which is the first Chinese multi-modal pretrained model.
arXiv Detail & Related papers (2020-03-30T03:13:22Z) - AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.
The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch.
The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level.
The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.