NorBench -- A Benchmark for Norwegian Language Models
- URL: http://arxiv.org/abs/2305.03880v1
- Date: Sat, 6 May 2023 00:20:24 GMT
- Title: NorBench -- A Benchmark for Norwegian Language Models
- Authors: David Samuel, Andrey Kutuzov, Samia Touileb, Erik Velldal, Lilja
{\O}vrelid, Egil R{\o}nningstad, Elina Sigdel and Anna Palatkina
- Abstract summary: We present NorBench: a suite of NLP tasks and probes for evaluating Norwegian language models (LMs) on standardized data splits and evaluation metrics.
We also introduce a range of new Norwegian language models (both encoder and encoder-decoder based)
We compare and analyze their performance, along with other existing LMs, across the different benchmark tests of NorBench.
- Score: 7.395163289937936
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present NorBench: a streamlined suite of NLP tasks and probes for
evaluating Norwegian language models (LMs) on standardized data splits and
evaluation metrics. We also introduce a range of new Norwegian language models
(both encoder and encoder-decoder based). Finally, we compare and analyze their
performance, along with other existing LMs, across the different benchmark
tests of NorBench.
Related papers
- DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages [49.38663048447942]
We propose DIALECTBENCH, the first-ever large-scale benchmark for NLP on varieties.
This allows for a comprehensive evaluation of NLP system performance on different language varieties.
We provide substantial evidence of performance disparities between standard and non-standard language varieties.
arXiv Detail & Related papers (2024-03-16T20:18:36Z) - NLEBench+NorGLM: A Comprehensive Empirical Analysis and Benchmark Dataset for Generative Language Models in Norwegian [4.062031248854444]
Norwegian, spoken by only 5 million population, is under-representative within the most impressive breakthroughs in NLP tasks.
To fill this gap, we compiled the existing Norwegian dataset and pre-trained 4 Norwegian Open Language Models.
We find that the mainstream, English-dominated LM GPT-3.5 has limited capability in understanding the Norwegian context.
arXiv Detail & Related papers (2023-12-03T08:09:45Z) - NoCoLA: The Norwegian Corpus of Linguistic Acceptability [2.538209532048867]
We present two new Norwegian datasets for evaluating language models.
NoCoLA_class is a supervised binary classification task where the goal is to discriminate between acceptable and non-acceptable sentences.
NoCoLA_zero is a purely diagnostic task for evaluating the grammatical judgement of a language model in a completely zero-shot manner.
arXiv Detail & Related papers (2023-06-13T14:11:19Z) - ScandEval: A Benchmark for Scandinavian Natural Language Processing [0.0]
This paper introduces a Scandinavian benchmarking platform, ScandEval, which can benchmark any pretrained model on four different tasks in the Scandinavian languages.
The datasets used in two of the tasks, linguistic acceptability and question answering, are new.
We develop and release a Python package and command-line interface, scandeval, which can benchmark any model that has been uploaded to the Hugging Face Hub, with reproducible results.
arXiv Detail & Related papers (2023-04-03T11:51:46Z) - FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation [64.9546787488337]
We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation.
The dataset consists of professional translations from English into two regional variants each of Portuguese and Mandarin Chinese.
arXiv Detail & Related papers (2022-10-01T05:02:04Z) - GEMv2: Multilingual NLG Benchmarking in a Single Line of Code [161.1761414080574]
Generation, Evaluation, and Metrics Benchmark introduces a modular infrastructure for dataset, model, and metric developers.
GEMv2 supports 40 documented datasets in 51 languages.
Models for all datasets can be evaluated online and our interactive data card creation and rendering tools make it easier to add new datasets to the living benchmark.
arXiv Detail & Related papers (2022-06-22T17:52:30Z) - Quality-Aware Decoding for Neural Machine Translation [64.24934199944875]
We propose quality-aware decoding for neural machine translation (NMT)
We leverage recent breakthroughs in reference-free and reference-based MT evaluation through various inference methods.
We find that quality-aware decoding consistently outperforms MAP-based decoding according both to state-of-the-art automatic metrics and to human assessments.
arXiv Detail & Related papers (2022-05-02T15:26:28Z) - Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP
models [53.95094814056337]
This paper presents Russian SuperGLUE 1.1, an updated benchmark styled after GLUE for Russian NLP models.
The new version includes a number of technical, user experience and methodological improvements.
We provide the integration of Russian SuperGLUE with a framework for industrial evaluation of the open-source models, MOROCCO.
arXiv Detail & Related papers (2022-02-15T23:45:30Z) - MOROCCO: Model Resource Comparison Framework [61.444083353087294]
We present MOROCCO, a framework to compare language models compatible with ttjiant environment which supports over 50 NLU tasks.
We demonstrate its applicability for two GLUE-like suites in different languages.
arXiv Detail & Related papers (2021-04-29T13:01:27Z) - Operationalizing a National Digital Library: The Case for a Norwegian
Transformer Model [0.0]
We show the process of building a large-scale training set from digital and digitized collections at a national library.
The resulting Bidirectional Representations from Transformers (BERT)-based language model for Norwegian outperforms multilingual BERT (mBERT) models in several token and sequence classification tasks.
arXiv Detail & Related papers (2021-04-19T20:36:24Z) - Large-Scale Contextualised Language Modelling for Norwegian [7.5722195869569]
This paper introduces the first large-scale monolingual language models for Norwegian, based on both the ELMo and BERT frameworks.
In addition to detailing the training process, we present contrastive benchmark results on a suite of NLP tasks for Norwegian.
arXiv Detail & Related papers (2021-04-13T23:18:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.