Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP
models
- URL: http://arxiv.org/abs/2202.07791v1
- Date: Tue, 15 Feb 2022 23:45:30 GMT
- Title: Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP
models
- Authors: Alena Fenogenova, Maria Tikhonova, Vladislav Mikhailov, Tatiana
Shavrina, Anton Emelyanov, Denis Shevelev, Alexandr Kukushkin, Valentin
Malykh, Ekaterina Artemova
- Abstract summary: This paper presents Russian SuperGLUE 1.1, an updated benchmark styled after GLUE for Russian NLP models.
The new version includes a number of technical, user experience and methodological improvements.
We provide the integration of Russian SuperGLUE with a framework for industrial evaluation of the open-source models, MOROCCO.
- Score: 53.95094814056337
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the last year, new neural architectures and multilingual pre-trained
models have been released for Russian, which led to performance evaluation
problems across a range of language understanding tasks.
This paper presents Russian SuperGLUE 1.1, an updated benchmark styled after
GLUE for Russian NLP models. The new version includes a number of technical,
user experience and methodological improvements, including fixes of the
benchmark vulnerabilities unresolved in the previous version: novel and
improved tests for understanding the meaning of a word in context (RUSSE) along
with reading comprehension and common sense reasoning (DaNetQA, RuCoS, MuSeRC).
Together with the release of the updated datasets, we improve the benchmark
toolkit based on \texttt{jiant} framework for consistent training and
evaluation of NLP-models of various architectures which now supports the most
recent models for Russian. Finally, we provide the integration of Russian
SuperGLUE with a framework for industrial evaluation of the open-source models,
MOROCCO (MOdel ResOurCe COmparison), in which the models are evaluated
according to the weighted average metric over all tasks, the inference speed,
and the occupied amount of RAM. Russian SuperGLUE is publicly available at
https://russiansuperglue.com/.
Related papers
- Vikhr: The Family of Open-Source Instruction-Tuned Large Language Models for Russian [46.76757653630145]
Vikhr is a new state-of-the-art open-source instruction-tuned LLM for the Russian language.
Vikhhr features an adapted tokenizer vocabulary and undergoes the continued pre-training and instruction tuning of all weights.
Vikhhr not only sets the new state of the art among open-source LLMs for Russian, but even outperforms some proprietary closed-source models on certain benchmarks.
arXiv Detail & Related papers (2024-05-22T18:58:58Z) - Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context.
We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability.
Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z) - Evaluation of Transfer Learning for Polish with a Text-to-Text Model [54.81823151748415]
We introduce a new benchmark for assessing the quality of text-to-text models for Polish.
The benchmark consists of diverse tasks and datasets: KLEJ benchmark adapted for text-to-text, en-pl translation, summarization, and question answering.
We present plT5 - a general-purpose text-to-text model for Polish that can be fine-tuned on various Natural Language Processing (NLP) tasks with a single training objective.
arXiv Detail & Related papers (2022-05-18T09:17:14Z) - Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian
SuperGLUE Tasks [2.6189995284654737]
Leader-boards like SuperGLUE are seen as important incentives for active development of NLP.
We show that its test datasets are vulnerable to shallows.
It is likely (as the simplest explanation) that a significant part of the SOTA models performance in the RSG leader-board is due to exploiting these shallows.
arXiv Detail & Related papers (2021-05-03T22:19:22Z) - MOROCCO: Model Resource Comparison Framework [61.444083353087294]
We present MOROCCO, a framework to compare language models compatible with ttjiant environment which supports over 50 NLU tasks.
We demonstrate its applicability for two GLUE-like suites in different languages.
arXiv Detail & Related papers (2021-04-29T13:01:27Z) - RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark [5.258267224004844]
We introduce an advanced Russian general language understanding evaluation benchmark -- RussianGLUE.
For the first time, a benchmark of nine tasks, collected and organized analogically to the SuperGLUE methodology, was developed from scratch for the Russian language.
arXiv Detail & Related papers (2020-10-29T20:31:39Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - COMET: A Neural Framework for MT Evaluation [8.736370689844682]
We present COMET, a neural framework for training multilingual machine translation evaluation models.
Our framework exploits information from both the source input and a target-language reference translation in order to more accurately predict MT quality.
Our models achieve new state-of-the-art performance on the WMT 2019 Metrics shared task and demonstrate robustness to high-performing systems.
arXiv Detail & Related papers (2020-09-18T18:54:15Z) - KLEJ: Comprehensive Benchmark for Polish Language Understanding [4.702729080310267]
We introduce a comprehensive multi-task benchmark for the Polish language understanding, accompanied by an online leaderboard.
We also release HerBERT, a Transformer-based model trained specifically for the Polish language, which has the best average performance and obtains the best results for three out of nine tasks.
arXiv Detail & Related papers (2020-05-01T21:55:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.