Graph-based Clustering for Detecting Semantic Change Across Time and
Languages
- URL: http://arxiv.org/abs/2402.01025v1
- Date: Thu, 1 Feb 2024 21:27:19 GMT
- Title: Graph-based Clustering for Detecting Semantic Change Across Time and
Languages
- Authors: Xianghe Ma, Michael Strube, Wei Zhao
- Abstract summary: We propose a graph-based clustering approach to capture nuanced changes in both high- and low-frequency word senses across time and languages.
Our approach substantially surpasses previous approaches in the SemEval 2020 binary classification task across four languages.
- Score: 10.058655884092094
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Despite the predominance of contextualized embeddings in NLP, approaches to
detect semantic change relying on these embeddings and clustering methods
underperform simpler counterparts based on static word embeddings. This stems
from the poor quality of the clustering methods to produce sense clusters --
which struggle to capture word senses, especially those with low frequency.
This issue hinders the next step in examining how changes in word senses in one
language influence another. To address this issue, we propose a graph-based
clustering approach to capture nuanced changes in both high- and low-frequency
word senses across time and languages, including the acquisition and loss of
these senses over time. Our experimental results show that our approach
substantially surpasses previous approaches in the SemEval2020 binary
classification task across four languages. Moreover, we showcase the ability of
our approach as a versatile visualization tool to detect semantic changes in
both intra-language and inter-language setups. We make our code and data
publicly available.
Related papers
- Contrastive Learning-based Chaining-Cluster for Multilingual Voice-Face Association [24.843733099049015]
This paper introduces our novel solution to the Face-Voice Association in Multilingual Environments (FAME) 2024 challenge.
It focuses on a contrastive learning-based chaining-cluster method to enhance face-voice association.
We conducted extensive experiments to investigate the impact of language on face-voice association.
The results demonstrate the superior performance of our method, and we validate the robustness and effectiveness of our proposed approach.
arXiv Detail & Related papers (2024-08-04T13:24:36Z) - Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation [114.72734384299476]
We propose a Language-Driven Visual Consensus (LDVC) approach, fostering improved alignment of semantic and visual information.
We leverage class embeddings as anchors due to their discrete and abstract nature, steering vision features toward class embeddings.
Our approach significantly boosts the capacity of segmentation models for unseen classes.
arXiv Detail & Related papers (2024-03-13T11:23:55Z) - Semantic change detection for Slovene language: a novel dataset and an
approach based on optimal transport [0.0]
We focus on the detection of semantic changes in Slovene, a less resourced Slavic language with two million speakers.
We present the first Slovene dataset for evaluating semantic change detection systems.
arXiv Detail & Related papers (2024-02-26T14:27:06Z) - ConNER: Consistency Training for Cross-lingual Named Entity Recognition [96.84391089120847]
Cross-lingual named entity recognition suffers from data scarcity in the target languages.
We propose ConNER as a novel consistency training framework for cross-lingual NER.
arXiv Detail & Related papers (2022-11-17T07:57:54Z) - Word Sense Induction with Hierarchical Clustering and Mutual Information
Maximization [14.997937028599255]
Word sense induction is a difficult problem in natural language processing.
We propose a novel unsupervised method based on hierarchical clustering and invariant information clustering.
We empirically demonstrate that, in certain cases, our approach outperforms prior WSI state-of-the-art methods.
arXiv Detail & Related papers (2022-10-11T13:04:06Z) - Learning Semantic Correspondence with Sparse Annotations [66.37298464505261]
Finding dense semantic correspondence is a fundamental problem in computer vision.
We propose a teacher-student learning paradigm for generating dense pseudo-labels.
We also develop two novel strategies for denoising pseudo-labels.
arXiv Detail & Related papers (2022-08-15T02:24:18Z) - Transferring Semantic Knowledge Into Language Encoders [6.85316573653194]
We introduce semantic form mid-tuning, an approach for transferring semantic knowledge from semantic meaning representations into language encoders.
We show that this alignment can be learned implicitly via classification or directly via triplet loss.
Our method yields language encoders that demonstrate improved predictive performance across inference, reading comprehension, textual similarity, and other semantic tasks.
arXiv Detail & Related papers (2021-10-14T14:11:12Z) - Neural Variational Learning for Grounded Language Acquisition [14.567067583556714]
We propose a learning system in which language is grounded in visual percepts without specific pre-defined categories of terms.
We show that this generative approach exhibits promising results in language grounding without pre-specifying visual categories under low resource settings.
arXiv Detail & Related papers (2021-07-20T20:55:02Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - SST-BERT at SemEval-2020 Task 1: Semantic Shift Tracing by Clustering in
BERT-based Embedding Spaces [63.17308641484404]
We propose to identify clusters among different occurrences of each target word, considering these as representatives of different word meanings.
Disagreements in obtained clusters naturally allow to quantify the level of semantic shift per each target word in four target languages.
Our approach performs well both measured separately (per language) and overall, where we surpass all provided SemEval baselines.
arXiv Detail & Related papers (2020-10-02T08:38:40Z) - Robust Cross-lingual Embeddings from Parallel Sentences [65.85468628136927]
We propose a bilingual extension of the CBOW method which leverages sentence-aligned corpora to obtain robust cross-lingual word representations.
Our approach significantly improves crosslingual sentence retrieval performance over all other approaches.
It also achieves parity with a deep RNN method on a zero-shot cross-lingual document classification task.
arXiv Detail & Related papers (2019-12-28T16:18:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.