Semantic change detection for Slovene language: a novel dataset and an
approach based on optimal transport
- URL: http://arxiv.org/abs/2402.16596v1
- Date: Mon, 26 Feb 2024 14:27:06 GMT
- Title: Semantic change detection for Slovene language: a novel dataset and an
approach based on optimal transport
- Authors: Marko Pranji\'c (1 and 2), Kaja Dobrovoljc (1), Senja Pollak (1),
Matej Martinc (1) ((1) Jo\v{z}ef Stefan Institute, Ljubljana, Slovenia, (2)
Jo\v{z}ef Stefan International Postgraduate School, Ljubljana, Slovenia)
- Abstract summary: We focus on the detection of semantic changes in Slovene, a less resourced Slavic language with two million speakers.
We present the first Slovene dataset for evaluating semantic change detection systems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper, we focus on the detection of semantic changes in Slovene, a
less resourced Slavic language with two million speakers. Detecting and
tracking semantic changes provides insights into the evolution of the language
caused by changes in society and culture. Recently, several systems have been
proposed to aid in this study, but all depend on manually annotated gold
standard datasets for evaluation. In this paper, we present the first Slovene
dataset for evaluating semantic change detection systems, which contains
aggregated semantic change scores for 104 target words obtained from more than
3000 manually annotated sentence pairs. We evaluate several existing semantic
change detection methods on this dataset and also propose a novel approach
based on optimal transport that improves on the existing state-of-the-art
systems with an error reduction rate of 22.8%.
Related papers
- Beyond Coarse-Grained Matching in Video-Text Retrieval [50.799697216533914]
We introduce a new approach for fine-grained evaluation.
Our approach can be applied to existing datasets by automatically generating hard negative test captions.
Experiments on our fine-grained evaluations demonstrate that this approach enhances a model's ability to understand fine-grained differences.
arXiv Detail & Related papers (2024-10-16T09:42:29Z) - Graph-based Clustering for Detecting Semantic Change Across Time and
Languages [10.058655884092094]
We propose a graph-based clustering approach to capture nuanced changes in both high- and low-frequency word senses across time and languages.
Our approach substantially surpasses previous approaches in the SemEval 2020 binary classification task across four languages.
arXiv Detail & Related papers (2024-02-01T21:27:19Z) - Semantic Change Detection for the Romanian Language [0.5202524136984541]
We analyze different strategies to create static and contextual word embedding models on real-world datasets.
We first evaluate both word embedding models on an English dataset (SEMEVAL-CCOHA) and then on a Romanian dataset.
The experimental results show that, depending on the corpus, the most important factors to consider are the choice of model and the distance to calculate a score for detecting semantic change.
arXiv Detail & Related papers (2023-08-23T13:37:02Z) - Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data.
We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z) - Contextualized language models for semantic change detection: lessons
learned [4.436724861363513]
We present a qualitative analysis of the outputs of contextualized embedding-based methods for detecting diachronic semantic change.
Our findings show that contextualized methods can often predict high change scores for words which are not undergoing any real diachronic semantic shift.
Our conclusion is that pre-trained contextualized language models are prone to confound changes in lexicographic senses and changes in contextual variance.
arXiv Detail & Related papers (2022-08-31T23:35:24Z) - Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction.
It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition.
We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z) - Lexical Semantic Change Discovery [22.934650688233734]
We propose a shift from change detection to change discovery, i.e., discovering novel word senses over time from the full corpus vocabulary.
By heavily fine-tuning a type-based and a token-based approach on recently published German data, we demonstrate that both models can successfully be applied to discover new words undergoing meaning change.
arXiv Detail & Related papers (2021-06-06T13:02:38Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - Morphologically Aware Word-Level Translation [82.59379608647147]
We propose a novel morphologically aware probability model for bilingual lexicon induction.
Our model exploits the basic linguistic intuition that the lexeme is the key lexical unit of meaning.
arXiv Detail & Related papers (2020-11-15T17:54:49Z) - Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond [58.80417796087894]
Cross-lingual adaptation with multilingual pre-trained language models (mPTLMs) mainly consists of two lines of works: zero-shot approach and translation-based approach.
We propose a novel framework to consolidate the zero-shot approach and the translation-based approach for better adaptation performance.
arXiv Detail & Related papers (2020-10-23T13:47:01Z) - SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection [10.606357227329822]
Evaluation is currently the most pressing problem in Lexical Semantic Change detection.
No gold standards are available to the community, which hinders progress.
We present the results of the first shared task that addresses this gap.
arXiv Detail & Related papers (2020-07-22T14:37:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.