Can Word Sense Distribution Detect Semantic Changes of Words?
- URL: http://arxiv.org/abs/2310.10400v1
- Date: Mon, 16 Oct 2023 13:41:27 GMT
- Title: Can Word Sense Distribution Detect Semantic Changes of Words?
- Authors: Xiaohang Tang, Yi Zhou, Taichi Aida, Procheta Sen, Danushka Bollegala
- Abstract summary: We show that word sense distributions can be accurately used to predict semantic changes of words in English, German, Swedish and Latin.
Our experimental results on SemEval 2020 Task 1 dataset show that word sense distributions can be accurately used to predict semantic changes of words.
- Score: 35.17635565325166
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic Change Detection (SCD) of words is an important task for various NLP
applications that must make time-sensitive predictions. Some words are used
over time in novel ways to express new meanings, and these new meanings
establish themselves as novel senses of existing words. On the other hand, Word
Sense Disambiguation (WSD) methods associate ambiguous words with sense ids,
depending on the context in which they occur. Given this relationship between
WSD and SCD, we explore the possibility of predicting whether a target word has
its meaning changed between two corpora collected at different time steps, by
comparing the distributions of senses of that word in each corpora. For this
purpose, we use pretrained static sense embeddings to automatically annotate
each occurrence of the target word in a corpus with a sense id. Next, we
compute the distribution of sense ids of a target word in a given corpus.
Finally, we use different divergence or distance measures to quantify the
semantic change of the target word across the two given corpora. Our
experimental results on SemEval 2020 Task 1 dataset show that word sense
distributions can be accurately used to predict semantic changes of words in
English, German, Swedish and Latin.
Related papers
- Word sense extension [8.939269057094661]
We present a paradigm of word sense extension (WSE) that enables words to spawn new senses toward novel context.
We develop a framework that simulates novel word sense extension by partitioning a polysemous word type into two pseudo-tokens that mark its different senses.
Our framework combines cognitive models of chaining with a learning scheme that transforms a language model embedding space to support various types of word sense extension.
arXiv Detail & Related papers (2023-06-09T00:54:21Z) - Unsupervised Semantic Variation Prediction using the Distribution of
Sibling Embeddings [17.803726860514193]
Detection of semantic variation of words is an important task for various NLP applications.
We argue that mean representations alone cannot accurately capture such semantic variations.
We propose a method that uses the entire cohort of the contextualised embeddings of the target word.
arXiv Detail & Related papers (2023-05-15T13:58:21Z) - Connect-the-Dots: Bridging Semantics between Words and Definitions via
Aligning Word Sense Inventories [47.03271152494389]
Word Sense Disambiguation aims to automatically identify the exact meaning of one word according to its context.
Existing supervised models struggle to make correct predictions on rare word senses due to limited training data.
We propose a gloss alignment algorithm that can align definition sentences with the same meaning from different sense inventories to collect rich lexical knowledge.
arXiv Detail & Related papers (2021-10-27T00:04:33Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - UoB at SemEval-2020 Task 1: Automatic Identification of Novel Word
Senses [0.6980076213134383]
This paper presents an approach to lexical semantic change detection based on Bayesian word sense induction suitable for novel word sense identification.
The same approach is also applied to a corpus gleaned from 15 years of Twitter data, the results of which are then used to identify words which may be instances of slang.
arXiv Detail & Related papers (2020-10-18T19:27:06Z) - SST-BERT at SemEval-2020 Task 1: Semantic Shift Tracing by Clustering in
BERT-based Embedding Spaces [63.17308641484404]
We propose to identify clusters among different occurrences of each target word, considering these as representatives of different word meanings.
Disagreements in obtained clusters naturally allow to quantify the level of semantic shift per each target word in four target languages.
Our approach performs well both measured separately (per language) and overall, where we surpass all provided SemEval baselines.
arXiv Detail & Related papers (2020-10-02T08:38:40Z) - Moving Down the Long Tail of Word Sense Disambiguation with
Gloss-Informed Biencoders [79.38278330678965]
A major obstacle in Word Sense Disambiguation (WSD) is that word senses are not uniformly distributed.
We propose a bi-encoder model that independently embeds (1) the target word with its surrounding context and (2) the dictionary definition, or gloss, of each sense.
arXiv Detail & Related papers (2020-05-06T04:21:45Z) - Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning [29.181547214915238]
We show that an attacker can control the "meaning" of new and existing words by changing their locations in the embedding space.
An attack on the embedding can affect diverse downstream tasks, demonstrating for the first time the power of data poisoning in transfer learning scenarios.
arXiv Detail & Related papers (2020-01-14T17:48:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.