RuSemShift: a dataset of historical lexical semantic change in Russian
- URL: http://arxiv.org/abs/2010.06436v1
- Date: Tue, 13 Oct 2020 14:54:05 GMT
- Title: RuSemShift: a dataset of historical lexical semantic change in Russian
- Authors: Julia Rodina, Andrey Kutuzov
- Abstract summary: We present RuSemShift, a large-scale manually annotated test set for the task of semantic change modeling in Russian.
Target words were annotated by multiple crowd-source workers.
We report the performance of several distributional approaches on RuSemShift, achieving promising results.
- Score: 3.261599248682794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present RuSemShift, a large-scale manually annotated test set for the task
of semantic change modeling in Russian for two long-term time period pairs:
from the pre-Soviet through the Soviet times and from the Soviet through the
post-Soviet times. Target words were annotated by multiple crowd-source
workers. The annotation process was organized following the DURel framework and
was based on sentence contexts extracted from the Russian National Corpus.
Additionally, we report the performance of several distributional approaches on
RuSemShift, achieving promising results, which at the same time leave room for
other researchers to improve.
Related papers
- Syntactic Language Change in English and German: Metrics, Parsers, and Convergences [56.47832275431858]
The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years.
We base our observations on five dependencys, including the widely used Stanford Core as well as 4 newer alternatives.
We show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions.
arXiv Detail & Related papers (2024-02-18T11:46:16Z) - Semantic Change Detection for the Romanian Language [0.5202524136984541]
We analyze different strategies to create static and contextual word embedding models on real-world datasets.
We first evaluate both word embedding models on an English dataset (SEMEVAL-CCOHA) and then on a Romanian dataset.
The experimental results show that, depending on the corpus, the most important factors to consider are the choice of model and the distance to calculate a score for detecting semantic change.
arXiv Detail & Related papers (2023-08-23T13:37:02Z) - A big data approach towards sarcasm detection in Russian [0.0]
We present a set of deterministic algorithms for Russian inflection and automated text synthesis.
These algorithms are implemented in a publicly available web-service www.passare.ru.
arXiv Detail & Related papers (2023-06-01T08:34:26Z) - Retrofitting Multilingual Sentence Embeddings with Abstract Meaning
Representation [70.58243648754507]
We introduce a new method to improve existing multilingual sentence embeddings with Abstract Meaning Representation (AMR)
Compared with the original textual input, AMR is a structured semantic representation that presents the core concepts and relations in a sentence explicitly and unambiguously.
Experiment results show that retrofitting multilingual sentence embeddings with AMR leads to better state-of-the-art performance on both semantic similarity and transfer tasks.
arXiv Detail & Related papers (2022-10-18T11:37:36Z) - Compositional Temporal Grounding with Structured Variational Cross-Graph
Correspondence Learning [92.07643510310766]
Temporal grounding in videos aims to localize one target video segment that semantically corresponds to a given query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We empirically find that they fail to generalize to queries with novel combinations of seen words.
We propose a variational cross-graph reasoning framework that explicitly decomposes video and language into multiple structured hierarchies.
arXiv Detail & Related papers (2022-03-24T12:55:23Z) - Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP
models [53.95094814056337]
This paper presents Russian SuperGLUE 1.1, an updated benchmark styled after GLUE for Russian NLP models.
The new version includes a number of technical, user experience and methodological improvements.
We provide the integration of Russian SuperGLUE with a framework for industrial evaluation of the open-source models, MOROCCO.
arXiv Detail & Related papers (2022-02-15T23:45:30Z) - Three-part diachronic semantic change dataset for Russian [4.7566046630595755]
We present a manually annotated lexical semantic change dataset for Russian: RuShiftEval.
Its novelty is ensured by a single set of target words annotated for their diachronic semantic shifts across three time periods.
arXiv Detail & Related papers (2021-06-15T17:12:25Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - ELMo and BERT in semantic change detection for Russian [4.389735175149927]
We study the effectiveness of contextualized embeddings for the task of diachronic semantic change detection for Russian language data.
Evaluation test sets consist of Russian nouns and adjectives annotated based on their occurrences in texts created in pre-Soviet, Soviet and post-Soviet time periods.
arXiv Detail & Related papers (2020-10-07T15:34:00Z) - Dataset for Automatic Summarization of Russian News [0.0]
We present Gazeta, the first dataset for summarization of Russian news.
We demonstrate that the dataset is a valid task for methods of text summarization for Russian.
arXiv Detail & Related papers (2020-06-19T10:44:06Z) - RUSSE'2020: Findings of the First Taxonomy Enrichment Task for the
Russian language [70.27072729280528]
This paper describes the results of the first shared task on taxonomy enrichment for the Russian language.
16 teams participated in the task demonstrating high results with more than half of them outperforming the provided baseline.
arXiv Detail & Related papers (2020-05-22T13:30:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.