Related papers: Analyzing Continuous Semantic Shifts with Diachronic Word Similarity Matrices

Analyzing Continuous Semantic Shifts with Diachronic Word Similarity Matrices

URL: http://arxiv.org/abs/2501.09538v2
Date: Fri, 17 Jan 2025 02:35:08 GMT
Title: Analyzing Continuous Semantic Shifts with Diachronic Word Similarity Matrices
Authors: Hajime Kiyama, Taichi Aida, Mamoru Komachi, Toshinobu Ogiso, Hiroya Takamura, Daichi Mochihashi,
Abstract summary: We propose a simple yet intuitive framework for how semantic shifts occur over multiple time periods.<n>We compute a diachronic word similarity matrix using fast and lightweight word embeddings.<n>We can categorize words that exhibit similar behavior of semantic shift in an unsupervised manner.
Score: 21.852268893122115
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The meanings and relationships of words shift over time. This phenomenon is referred to as semantic shift. Research focused on understanding how semantic shifts occur over multiple time periods is essential for gaining a detailed understanding of semantic shifts. However, detecting change points only between adjacent time periods is insufficient for analyzing detailed semantic shifts, and using BERT-based methods to examine word sense proportions incurs a high computational cost. To address those issues, we propose a simple yet intuitive framework for how semantic shifts occur over multiple time periods by leveraging a similarity matrix between the embeddings of the same word through time. We compute a diachronic word similarity matrix using fast and lightweight word embeddings across arbitrary time periods, making it deeper to analyze continuous semantic shifts. Additionally, by clustering the similarity matrices for different words, we can categorize words that exhibit similar behavior of semantic shift in an unsupervised manner.

Related papers

QUDsim: Quantifying Discourse Similarities in LLM-Generated Text [70.22275200293964]
We introduce an abstraction based on linguistic theories in Questions Under Discussion (QUD) and question semantics to help quantify differences in discourse progression. We then use this framework to build $textbfQUDsim$, a similarity metric that can detect discursive parallels between documents. Using QUDsim, we find that LLMs often reuse discourse structures (more so than humans) across samples, even when content differs.
arXiv Detail & Related papers (2025-04-12T23:46:09Z)
Spoken Word2Vec: Learning Skipgram Embeddings from Speech [0.8901073744693314]
We show how shallow skipgram-like algorithms fail to encode distributional semantics when the input units are acoustically correlated. We illustrate the potential of an alternative deep end-to-end variant of the model and examine the effects on the resulting embeddings.
arXiv Detail & Related papers (2023-11-15T19:25:29Z)
A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation [0.0]
We compare several recent sentence embedding methods via time-series of semantic similarity between successive sentences and matrices of pairwise sentence similarity for multiple books of literature. We find that most of the sentence embedding methods considered do infer highly correlated patterns of semantic similarity in a given document, but show interesting differences.
arXiv Detail & Related papers (2023-08-08T23:31:10Z)
Unsupervised Semantic Variation Prediction using the Distribution of Sibling Embeddings [17.803726860514193]
Detection of semantic variation of words is an important task for various NLP applications. We argue that mean representations alone cannot accurately capture such semantic variations. We propose a method that uses the entire cohort of the contextualised embeddings of the target word.
arXiv Detail & Related papers (2023-05-15T13:58:21Z)
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure" We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z)
Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings. RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z)
Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation. This paper aims to address the issue with a mask-and-predict strategy. We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions. Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z)
Rethinking Crowd Sourcing for Semantic Similarity [0.13999481573773073]
This paper investigates the ambiguities inherent in crowd-sourced semantic labeling. It shows that annotators that treat semantic similarity as a binary category play the most important role in the labeling.
arXiv Detail & Related papers (2021-09-24T13:57:30Z)
EDS-MEMBED: Multi-sense embeddings based on enhanced distributional semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings. We derive new distributional semantic similarity measures for M-SE from prior ones. We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z)
Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change. We show that our method can be used for the detection of semantic change with any alignment method. We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z)
SST-BERT at SemEval-2020 Task 1: Semantic Shift Tracing by Clustering in BERT-based Embedding Spaces [63.17308641484404]
We propose to identify clusters among different occurrences of each target word, considering these as representatives of different word meanings. Disagreements in obtained clusters naturally allow to quantify the level of semantic shift per each target word in four target languages. Our approach performs well both measured separately (per language) and overall, where we surpass all provided SemEval baselines.
arXiv Detail & Related papers (2020-10-02T08:38:40Z)
Autoencoding Word Representations through Time for Semantic Change Detection [21.17543605603968]
We propose three variants of sequential models for detecting semantically shifted words. We take a step towards comparing different approaches in a quantitative manner, demonstrating that the temporal modelling of word representations yields a clear-cut advantage in performance.
arXiv Detail & Related papers (2020-04-28T17:58:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.