Related papers: Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks

Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks

URL: http://arxiv.org/abs/2102.00290v1
Date: Sat, 30 Jan 2021 18:59:43 GMT
Title: Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks
Authors: Maur\'icio Gruppi, Sibel Adal{\i}, Pin-Yu Chen
Abstract summary: We propose a self-supervised approach to model lexical semantic change. We show that our method can be used for the detection of semantic change with any alignment method. We illustrate the utility of our techniques using experimental results on three different datasets.
Score: 58.87961226278285
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The use of language is subject to variation over time as well as across social groups and knowledge domains, leading to differences even in the monolingual scenario. Such variation in word usage is often called lexical semantic change (LSC). The goal of LSC is to characterize and quantify language variations with respect to word meaning, to measure how distinct two language sources are (that is, people or language models). Because there is hardly any data available for such a task, most solutions involve unsupervised methods to align two embeddings and predict semantic change with respect to a distance measure. To that end, we propose a self-supervised approach to model lexical semantic change by generating training samples by introducing perturbations of word vectors in the input corpora. We show that our method can be used for the detection of semantic change with any alignment method. Furthermore, it can be used to choose the landmark words to use in alignment and can lead to substantial improvements over the existing techniques for alignment. We illustrate the utility of our techniques using experimental results on three different datasets, involving words with the same or different meanings. Our methods not only provide significant improvements but also can lead to novel findings for the LSC problem.

Related papers

Quantifying Lexical Semantic Shift via Unbalanced Optimal Transport [7.936706307117929]
We propose Sense Usage Shift (SUS), a measure that quantifies changes in the usage frequency of a word sense at each usage instance. We demonstrate that several challenges in semantic change detection can be addressed in a unified manner.
arXiv Detail & Related papers (2024-12-17T06:00:54Z)
Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs) We form "semantic tokens" by merging the semantically similar subwords and their embeddings. inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z)
Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective [50.261681681643076]
We propose a novel metric called SemVarEffect and a benchmark named SemVarBench to evaluate the causality between semantic variations in inputs and outputs in text-to-image synthesis. Our work establishes an effective evaluation framework that advances the T2I synthesis community's exploration of human instruction understanding.
arXiv Detail & Related papers (2024-10-14T08:45:35Z)
Definition generation for lexical semantic change detection [3.7297237438000788]
We use contextualized word definitions generated by large language models as semantic representations in the task of diachronic lexical semantic change detection (LSCD) In short, generated definitions are used as senses', and the change score of a target word is retrieved by comparing their distributions in two time periods under comparison. Our approach is on par with or outperforms prior non-supervised LSCD methods.
arXiv Detail & Related papers (2024-06-20T10:13:08Z)
Towards Unsupervised Recognition of Token-level Semantic Differences in Related Documents [61.63208012250885]
We formulate recognizing semantic differences as a token-level regression task. We study three unsupervised approaches that rely on a masked language model. Our results show that an approach based on word alignment and sentence-level contrastive learning has a robust correlation to gold labels.
arXiv Detail & Related papers (2023-05-22T17:58:04Z)
Unsupervised Semantic Variation Prediction using the Distribution of Sibling Embeddings [17.803726860514193]
Detection of semantic variation of words is an important task for various NLP applications. We argue that mean representations alone cannot accurately capture such semantic variations. We propose a method that uses the entire cohort of the contextualised embeddings of the target word.
arXiv Detail & Related papers (2023-05-15T13:58:21Z)
Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution [124.99894592871385]
We present a large-scale comparative study of lexical substitution methods employing both old and most recent language models. We show that already competitive results achieved by SOTA LMs/MLMs can be further substantially improved if information about the target word is injected properly.
arXiv Detail & Related papers (2022-06-07T16:16:19Z)
Grammatical Profiling for Semantic Change Detection [6.3596637237946725]
We use grammatical profiling as an alternative method for semantic change detection. We demonstrate that it can be used for semantic change detection and even outperforms some distributional semantic methods.
arXiv Detail & Related papers (2021-09-21T18:38:18Z)
SChME at SemEval-2020 Task 1: A Model Ensemble for Detecting Lexical Semantic Change [58.87961226278285]
This paper describes SChME, a method used in SemEval-2020 Task 1 on unsupervised detection of lexical semantic change. SChME usesa model ensemble combining signals of distributional models (word embeddings) and wordfrequency models where each model casts a vote indicating the probability that a word sufferedsemantic change according to that feature.
arXiv Detail & Related papers (2020-12-02T23:56:34Z)
Word Embeddings: Stability and Semantic Change [0.0]
We present an experimental study on the instability of the training process of three of the most influential embedding techniques of the last decade: word2vec, GloVe and fastText. We propose a statistical model to describe the instability of embedding techniques and introduce a novel metric to measure the instability of the representation of an individual word.
arXiv Detail & Related papers (2020-07-23T16:03:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.