Three-part diachronic semantic change dataset for Russian
- URL: http://arxiv.org/abs/2106.08294v1
- Date: Tue, 15 Jun 2021 17:12:25 GMT
- Title: Three-part diachronic semantic change dataset for Russian
- Authors: Andrey Kutuzov, Lidia Pivovarova
- Abstract summary: We present a manually annotated lexical semantic change dataset for Russian: RuShiftEval.
Its novelty is ensured by a single set of target words annotated for their diachronic semantic shifts across three time periods.
- Score: 4.7566046630595755
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a manually annotated lexical semantic change dataset for Russian:
RuShiftEval. Its novelty is ensured by a single set of target words annotated
for their diachronic semantic shifts across three time periods, while the
previous work either used only two time periods, or different sets of target
words. The paper describes the composition and annotation procedure for the
dataset. In addition, it is shown how the ternary nature of RuShiftEval allows
to trace specific diachronic trajectories: `changed at a particular time period
and stable afterwards' or `was changing throughout all time periods'. Based on
the analysis of the submissions to the recent shared task on semantic change
detection for Russian, we argue that correctly identifying such trajectories
can be an interesting sub-task itself.
Related papers
- Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective [50.261681681643076]
We propose a novel metric called SemVarEffect and a benchmark named SemVarBench to evaluate the causality between semantic variations in inputs and outputs in text-to-image synthesis.
Our work establishes an effective evaluation framework that advances the T2I synthesis community's exploration of human instruction understanding.
arXiv Detail & Related papers (2024-10-14T08:45:35Z) - Semantic Change Detection for the Romanian Language [0.5202524136984541]
We analyze different strategies to create static and contextual word embedding models on real-world datasets.
We first evaluate both word embedding models on an English dataset (SEMEVAL-CCOHA) and then on a Romanian dataset.
The experimental results show that, depending on the corpus, the most important factors to consider are the choice of model and the distance to calculate a score for detecting semantic change.
arXiv Detail & Related papers (2023-08-23T13:37:02Z) - Unsupervised Semantic Variation Prediction using the Distribution of
Sibling Embeddings [17.803726860514193]
Detection of semantic variation of words is an important task for various NLP applications.
We argue that mean representations alone cannot accurately capture such semantic variations.
We propose a method that uses the entire cohort of the contextualised embeddings of the target word.
arXiv Detail & Related papers (2023-05-15T13:58:21Z) - Variational Cross-Graph Reasoning and Adaptive Structured Semantics
Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z) - Compositional Temporal Grounding with Structured Variational Cross-Graph
Correspondence Learning [92.07643510310766]
Temporal grounding in videos aims to localize one target video segment that semantically corresponds to a given query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We empirically find that they fail to generalize to queries with novel combinations of seen words.
We propose a variational cross-graph reasoning framework that explicitly decomposes video and language into multiple structured hierarchies.
arXiv Detail & Related papers (2022-03-24T12:55:23Z) - NorDiaChange: Diachronic Semantic Change Dataset for Norwegian [63.65426535861836]
NorDiaChange is the first diachronic semantic change dataset for Norwegian.
It covers about 80 Norwegian nouns manually annotated with graded semantic change over time.
arXiv Detail & Related papers (2022-01-13T18:27:33Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - RuSemShift: a dataset of historical lexical semantic change in Russian [3.261599248682794]
We present RuSemShift, a large-scale manually annotated test set for the task of semantic change modeling in Russian.
Target words were annotated by multiple crowd-source workers.
We report the performance of several distributional approaches on RuSemShift, achieving promising results.
arXiv Detail & Related papers (2020-10-13T14:54:05Z) - ELMo and BERT in semantic change detection for Russian [4.389735175149927]
We study the effectiveness of contextualized embeddings for the task of diachronic semantic change detection for Russian language data.
Evaluation test sets consist of Russian nouns and adjectives annotated based on their occurrences in texts created in pre-Soviet, Soviet and post-Soviet time periods.
arXiv Detail & Related papers (2020-10-07T15:34:00Z) - SST-BERT at SemEval-2020 Task 1: Semantic Shift Tracing by Clustering in
BERT-based Embedding Spaces [63.17308641484404]
We propose to identify clusters among different occurrences of each target word, considering these as representatives of different word meanings.
Disagreements in obtained clusters naturally allow to quantify the level of semantic shift per each target word in four target languages.
Our approach performs well both measured separately (per language) and overall, where we surpass all provided SemEval baselines.
arXiv Detail & Related papers (2020-10-02T08:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.