UWB @ DIACR-Ita: Lexical Semantic Change Detection with CCA and
Orthogonal Transformation
- URL: http://arxiv.org/abs/2011.14678v1
- Date: Mon, 30 Nov 2020 10:41:50 GMT
- Title: UWB @ DIACR-Ita: Lexical Semantic Change Detection with CCA and
Orthogonal Transformation
- Authors: Ond\v{r}ej Pra\v{z}\'ak, Pavel P\v{r}ib\'a\v{n}, and Stephen Taylor
- Abstract summary: We describe our method for detection of lexical semantic change (i.e., word sense changes over time) for the DIACR-Ita shared task.
We examine semantic differences between specific words in two Italian corpora, chosen from different time periods.
- Score: 1.3764085113103222
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we describe our method for detection of lexical semantic
change (i.e., word sense changes over time) for the DIACR-Ita shared task,
where we ranked $1^{st}$. We examine semantic differences between specific
words in two Italian corpora, chosen from different time periods. Our method is
fully unsupervised and language independent. It consists of preparing a
semantic vector space for each corpus, earlier and later. Then we compute a
linear transformation between earlier and later spaces, using CCA and
Orthogonal Transformation. Finally, we measure the cosines between the
transformed vectors.
Related papers
- Investigating the Contextualised Word Embedding Dimensions Responsible for Contextual and Temporal Semantic Changes [30.563130208194977]
It remains unclear as to how the meaning changes are encoded in the embedding space.
We compare pre-trained CWEs and their fine-tuned versions on semantic change benchmarks.
Our results reveal several novel insights such as (a) although there exist a smaller number of axes that are responsible for semantic changes of words in the pre-trained CWE space, this information gets distributed across all dimensions when fine-tuned.
arXiv Detail & Related papers (2024-07-03T05:42:20Z) - A Semantic Distance Metric Learning approach for Lexical Semantic Change Detection [30.563130208194977]
A Lexical Semantic Change Detection (SCD) task involves predicting whether a given target word, $w$, changes its meaning between two different text corpora.
We propose a supervised two-staged SCD method that uses existing Word-in-Context (WiC) datasets.
Experimental results on multiple benchmark datasets for SCD show that our proposed method achieves strong performance in multiple languages.
arXiv Detail & Related papers (2024-03-01T02:09:25Z) - Backpack Language Models [108.65930795825416]
We present Backpacks, a new neural architecture that marries strong modeling performance with an interface for interpretability and control.
We find that, after training, sense vectors specialize, each encoding a different aspect of a word.
We present simple algorithms that intervene on sense vectors to perform controllable text generation and debiasing.
arXiv Detail & Related papers (2023-05-26T09:26:23Z) - Simple, Interpretable and Stable Method for Detecting Words with Usage
Change across Corpora [54.757845511368814]
The problem of comparing two bodies of text and searching for words that differ in their usage arises often in digital humanities and computational social science.
This is commonly approached by training word embeddings on each corpus, aligning the vector spaces, and looking for words whose cosine distance in the aligned space is large.
We propose an alternative approach that does not use vector space alignment, and instead considers the neighbors of each word.
arXiv Detail & Related papers (2021-12-28T23:46:00Z) - A Rule-based/BPSO Approach to Produce Low-dimensional Semantic Basis
Vectors Set [0.0]
In explicit semantic vectors, each dimension corresponds to a word, so word vectors are interpretable.
In this research, we propose a new approach to obtain low-dimensional explicit semantic vectors.
arXiv Detail & Related papers (2021-11-24T21:23:43Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - SChME at SemEval-2020 Task 1: A Model Ensemble for Detecting Lexical
Semantic Change [58.87961226278285]
This paper describes SChME, a method used in SemEval-2020 Task 1 on unsupervised detection of lexical semantic change.
SChME usesa model ensemble combining signals of distributional models (word embeddings) and wordfrequency models where each model casts a vote indicating the probability that a word sufferedsemantic change according to that feature.
arXiv Detail & Related papers (2020-12-02T23:56:34Z) - UWB at SemEval-2020 Task 1: Lexical Semantic Change Detection [1.2599533416395767]
We examine semantic differences between specific words in two corpora, chosen from different time periods, for English, German, Latin, and Swedish.
Our method was created for the SemEval 2020 Task 1: textitUnsupervised Lexical Semantic Change Detection.
arXiv Detail & Related papers (2020-11-30T10:47:45Z) - NLP-CIC @ DIACR-Ita: POS and Neighbor Based Distributional Models for
Lexical Semantic Change in Diachronic Italian Corpora [62.997667081978825]
We present our systems and findings on unsupervised lexical semantic change for the Italian language.
The task is to determine whether a target word has evolved its meaning with time, only relying on raw-text from two time-specific datasets.
We propose two models representing the target words across the periods to predict the changing words using threshold and voting schemes.
arXiv Detail & Related papers (2020-11-07T11:27:18Z) - Word Rotator's Distance [50.67809662270474]
Key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment.
We show that the norm of word vectors is a good proxy for word importance, and their angle is a good proxy for word similarity.
We propose a method that first decouples word vectors into their norm and direction, and then computes alignment-based similarity.
arXiv Detail & Related papers (2020-04-30T17:48:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.