German's Next Language Model
- URL: http://arxiv.org/abs/2010.10906v4
- Date: Thu, 3 Dec 2020 11:02:43 GMT
- Title: German's Next Language Model
- Authors: Branden Chan, Stefan Schweter, Timo M\"oller
- Abstract summary: We present the experiments which lead to the creation of our BERT and ELECTRA based German language models, GBERT and GELECTRA.
By varying the input training data, model size, and the presence of Whole Word Masking (WWM) we were able to attain SoTA performance across a set of document classification and named entity recognition tasks.
- Score: 0.8520624117635327
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work we present the experiments which lead to the creation of our
BERT and ELECTRA based German language models, GBERT and GELECTRA. By varying
the input training data, model size, and the presence of Whole Word Masking
(WWM) we were able to attain SoTA performance across a set of document
classification and named entity recognition (NER) tasks for both models of base
and large size. We adopt an evaluation driven approach in training these models
and our results indicate that both adding more data and utilizing WWM improve
model performance. By benchmarking against existing German models, we show that
these models are the best German models to date. Our trained models will be
made publicly available to the research community.
Related papers
- LLäMmlein: Compact and Competitive German-Only Language Models from Scratch [3.7160688974577156]
We create two German-only decoder models, LL"aMmlein 120M and 1B, transparently from scratch and publish them, along with the training data, for the German NLP research community to use.
The model training involved several key steps, including extensive data preprocessing, the creation of a custom German tokenizer, the training itself, as well as the evaluation of the final models on various benchmarks.
arXiv Detail & Related papers (2024-11-17T20:44:34Z) - Apple Intelligence Foundation Language Models [109.60033785567484]
This report describes the model architecture, the data used to train the model, the training process, and the evaluation results.
We highlight our focus on Responsible AI and how the principles are applied throughout the model development.
arXiv Detail & Related papers (2024-07-29T18:38:49Z) - ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets [106.7760874400261]
This paper presents ML-SUPERB2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models.
We find performance improvements over the setup of ML-SUPERB, but performance depends on the downstream model design.
Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches.
arXiv Detail & Related papers (2024-06-12T21:01:26Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Confidence-based Ensembles of End-to-End Speech Recognition Models [71.65982591023581]
We show that a confidence-based ensemble of 5 monolingual models outperforms a system where model selection is performed via a dedicated language identification block.
We also demonstrate that it is possible to combine base and adapted models to achieve strong results on both original and target data.
arXiv Detail & Related papers (2023-06-27T23:13:43Z) - Data-Efficient French Language Modeling with CamemBERTa [0.0]
We introduce CamemBERTa, a French DeBERTa model that builds upon the DeBERTaV3 architecture and training objective.
We evaluate our model's performance on a variety of French downstream tasks and datasets.
arXiv Detail & Related papers (2023-06-02T12:45:34Z) - DUMB: A Benchmark for Smart Evaluation of Dutch Models [23.811515104842826]
We introduce the Dutch Model Benchmark: DUMB. The benchmark includes a diverse set of datasets for low-, medium- and high-resource tasks.
Relative Error Reduction (RER) compares the DUMB performance of language models to a strong baseline.
Highest performance is achieved by DeBERTaV3 (large), XLM-R (large) and mDeBERTaV3 (base)
arXiv Detail & Related papers (2023-05-22T13:27:37Z) - METRO: Efficient Denoising Pretraining of Large Scale Autoencoding
Language Models with Model Generated Signals [151.3601429216877]
We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model.
We propose a recipe, namely "Model generated dEnoising TRaining Objective" (METRO)
The resultant models, METRO-LM, consisting of up to 5.4 billion parameters, achieve new state-of-the-art on the GLUE, SuperGLUE, and SQuAD benchmarks.
arXiv Detail & Related papers (2022-04-13T21:39:15Z) - Training dataset and dictionary sizes matter in BERT models: the case of
Baltic languages [0.0]
We train a trilingual LitLat BERT-like model for Lithuanian, Latvian, and English, and a monolingual Est-RoBERTa model for Estonian.
We evaluate their performance on four downstream tasks: named entity recognition, dependency parsing, part-of-speech tagging, and word analogy.
arXiv Detail & Related papers (2021-12-20T14:26:40Z) - Interpreting Language Models Through Knowledge Graph Extraction [42.97929497661778]
We compare BERT-based language models through snapshots of acquired knowledge at sequential stages of the training process.
We present a methodology to unveil a knowledge acquisition timeline by generating knowledge graph extracts from cloze "fill-in-the-blank" statements.
We extend this analysis to a comparison of pretrained variations of BERT models (DistilBERT, BERT-base, RoBERTa)
arXiv Detail & Related papers (2021-11-16T15:18:01Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.