Related papers: Chinese Word Sense Embedding with SememeWSD and Synonym Set

Chinese Word Sense Embedding with SememeWSD and Synonym Set

URL: http://arxiv.org/abs/2206.14388v1
Date: Wed, 29 Jun 2022 03:42:03 GMT
Title: Chinese Word Sense Embedding with SememeWSD and Synonym Set
Authors: Yangxi Zhou, Junping Du, Zhe Xue, Ang Li, Zeli Guan
Abstract summary: We propose SememeWSD Synonym (SWSDS) model to assign a different vector to every sense of polysemous words. We obtain top 10 synonyms of the word sense from OpenHowNet and calculate the average vector of synonyms as the vector of the word sense. In experiments, We evaluate the SWSDS model on semantic similarity calculation with Gensim's wmdistance method.
Score: 17.37973450772783
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Word embedding is a fundamental natural language processing task which can learn feature of words. However, most word embedding methods assign only one vector to a word, even if polysemous words have multi-senses. To address this limitation, we propose SememeWSD Synonym (SWSDS) model to assign a different vector to every sense of polysemous words with the help of word sense disambiguation (WSD) and synonym set in OpenHowNet. We use the SememeWSD model, an unsupervised word sense disambiguation model based on OpenHowNet, to do word sense disambiguation and annotate the polysemous word with sense id. Then, we obtain top 10 synonyms of the word sense from OpenHowNet and calculate the average vector of synonyms as the vector of the word sense. In experiments, We evaluate the SWSDS model on semantic similarity calculation with Gensim's wmdistance method. It achieves improvement of accuracy. We also examine the SememeWSD model on different BERT models to find the more effective model.

Related papers

CALE : Concept-Aligned Embeddings for Both Within-Lemma and Inter-Lemma Sense Differentiation [0.0]
Lexical semantics is concerned with both the multiple senses a word can adopt in different contexts, and the semantic relations that exist between meanings of different words.<n>To investigate them, Contextualized Language Models are a valuable tool that provides context-sensitive representations.<n>We propose an extension, Concept Differentiation, to include inter-words scenarios.
arXiv Detail & Related papers (2025-08-06T14:43:22Z)
Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs) We form "semantic tokens" by merging the semantically similar subwords and their embeddings. inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z)
Leveraging multilingual transfer for unsupervised semantic acoustic word embeddings [23.822788597966646]
Acoustic word embeddings (AWEs) are fixed-dimensional vector representations of speech segments that encode phonetic content. In this paper we explore semantic AWE modelling. We show -- for the first time -- that AWEs can be used for downstream semantic query-by-example search.
arXiv Detail & Related papers (2023-07-05T07:46:54Z)
Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories [47.03271152494389]
Word Sense Disambiguation aims to automatically identify the exact meaning of one word according to its context. Existing supervised models struggle to make correct predictions on rare word senses due to limited training data. We propose a gloss alignment algorithm that can align definition sentences with the same meaning from different sense inventories to collect rich lexical knowledge.
arXiv Detail & Related papers (2021-10-27T00:04:33Z)
Meta-Learning with Variational Semantic Memory for Word Sense Disambiguation [56.830395467247016]
We propose a model of semantic memory for WSD in a meta-learning setting. Our model is based on hierarchical variational inference and incorporates an adaptive memory update rule via a hypernetwork. We show our model advances the state of the art in few-shot WSD, supports effective learning in extremely data scarce scenarios.
arXiv Detail & Related papers (2021-06-05T20:40:01Z)
WOVe: Incorporating Word Order in GloVe Word Embeddings [0.0]
Defining a word as a vector makes it easy for machine learning algorithms to understand a text and extract information from it. Word vector representations have been used in many applications such word synonyms, word analogy, syntactic parsing, and many others.
arXiv Detail & Related papers (2021-05-18T15:28:20Z)
SemGloVe: Semantic Co-occurrences for GloVe from BERT [55.420035541274444]
GloVe learns word embeddings by leveraging statistical information from word co-occurrence matrices. We propose SemGloVe, which distills semantic co-occurrences from BERT into static GloVe word embeddings.
arXiv Detail & Related papers (2020-12-30T15:38:26Z)
SChME at SemEval-2020 Task 1: A Model Ensemble for Detecting Lexical Semantic Change [58.87961226278285]
This paper describes SChME, a method used in SemEval-2020 Task 1 on unsupervised detection of lexical semantic change. SChME usesa model ensemble combining signals of distributional models (word embeddings) and wordfrequency models where each model casts a vote indicating the probability that a word sufferedsemantic change according to that feature.
arXiv Detail & Related papers (2020-12-02T23:56:34Z)
Don't Neglect the Obvious: On the Role of Unambiguous Words in Word Sense Disambiguation [5.8523859781812435]
We show how a state-of-the-art propagation-based model can use it to extend the coverage and quality of its word sense embeddings. We introduce the UWA (Unambiguous Word s) dataset and show how a state-of-the-art propagation-based model can use it to extend the coverage and quality of its word sense embeddings.
arXiv Detail & Related papers (2020-04-29T16:51:21Z)
Lexical Sememe Prediction using Dictionary Definitions by Capturing Local Semantic Correspondence [94.79912471702782]
Sememes, defined as the minimum semantic units of human languages, have been proven useful in many NLP tasks. We propose a Sememe Correspondence Pooling (SCorP) model, which is able to capture this kind of matching to predict sememes. We evaluate our model and baseline methods on a famous sememe KB HowNet and find that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-01-16T17:30:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.