Comparative Analysis of Word Embeddings for Capturing Word Similarities
- URL: http://arxiv.org/abs/2005.03812v1
- Date: Fri, 8 May 2020 01:16:03 GMT
- Title: Comparative Analysis of Word Embeddings for Capturing Word Similarities
- Authors: Martina Toshevska, Frosina Stojanovska and Jovan Kalajdjieski
- Abstract summary: Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks.
Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings.
selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Distributed language representation has become the most widely used technique
for language representation in various natural language processing tasks. Most
of the natural language processing models that are based on deep learning
techniques use already pre-trained distributed word representations, commonly
called word embeddings. Determining the most qualitative word embeddings is of
crucial importance for such models. However, selecting the appropriate word
embeddings is a perplexing task since the projected embedding space is not
intuitive to humans. In this paper, we explore different approaches for
creating distributed word representations. We perform an intrinsic evaluation
of several state-of-the-art word embedding methods. Their performance on
capturing word similarities is analysed with existing benchmark datasets for
word pairs similarities. The research in this paper conducts a correlation
analysis between ground truth word similarities and similarities obtained by
different word embedding methods.
Related papers
- A Comprehensive Analysis of Static Word Embeddings for Turkish [0.058520770038704165]
There are basically two types of word embedding models which are non-contextual (static) models and contextual models.
We compare and evaluate the performance of several contextual and non-contextual models in both intrinsic and extrinsic evaluation settings for Turkish.
The results of the analyses provide insights about the suitability of different embedding models in different types of NLP tasks.
arXiv Detail & Related papers (2024-05-13T14:23:37Z) - The Impact of Word Splitting on the Semantic Content of Contextualized
Word Representations [3.4668147567693453]
The quality of representations of words that are split is often, but not always, worse than that of the embeddings of known words.
Our analysis reveals, among other interesting findings, that the quality of representations of words that are split is often, but not always, worse than that of the embeddings of known words.
arXiv Detail & Related papers (2024-02-22T15:04:24Z) - A Comprehensive Empirical Evaluation of Existing Word Embedding
Approaches [5.065947993017158]
We present the characteristics of existing word embedding approaches and analyze them with regard to many classification tasks.
Traditional approaches mostly use matrix factorization to produce word representations, and they are not able to capture the semantic and syntactic regularities of the language very well.
On the other hand, Neural-network-based approaches can capture sophisticated regularities of the language and preserve the word relationships in the generated word representations.
arXiv Detail & Related papers (2023-03-13T15:34:19Z) - Lost in Context? On the Sense-wise Variance of Contextualized Word
Embeddings [11.475144702935568]
We quantify how much the contextualized embeddings of each word sense vary across contexts in typical pre-trained models.
We find that word representations are position-biased, where the first words in different contexts tend to be more similar.
arXiv Detail & Related papers (2022-08-20T12:27:25Z) - LexSubCon: Integrating Knowledge from Lexical Resources into Contextual
Embeddings for Lexical Substitution [76.615287796753]
We introduce LexSubCon, an end-to-end lexical substitution framework based on contextual embedding models.
This is achieved by combining contextual information with knowledge from structured lexical resources.
Our experiments show that LexSubCon outperforms previous state-of-the-art methods on LS07 and CoInCo benchmark datasets.
arXiv Detail & Related papers (2021-07-11T21:25:56Z) - Understanding Synonymous Referring Expressions via Contrastive Features [105.36814858748285]
We develop an end-to-end trainable framework to learn contrastive features on the image and object instance levels.
We conduct extensive experiments to evaluate the proposed algorithm on several benchmark datasets.
arXiv Detail & Related papers (2021-04-20T17:56:24Z) - Accurate Word Representations with Universal Visual Guidance [55.71425503859685]
This paper proposes a visual representation method to explicitly enhance conventional word embedding with multiple-aspect senses from visual guidance.
We build a small-scale word-image dictionary from a multimodal seed dataset where each word corresponds to diverse related images.
Experiments on 12 natural language understanding and machine translation tasks further verify the effectiveness and the generalization capability of the proposed approach.
arXiv Detail & Related papers (2020-12-30T09:11:50Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - Interactive Re-Fitting as a Technique for Improving Word Embeddings [0.0]
We make it possible for humans to adjust portions of a word embedding space by moving sets of words closer to one another.
Our approach allows users to trigger selective post-processing as they interact with and assess potential bias in word embeddings.
arXiv Detail & Related papers (2020-09-30T21:54:22Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Multiplex Word Embeddings for Selectional Preference Acquisition [70.33531759861111]
We propose a multiplex word embedding model, which can be easily extended according to various relations among words.
Our model can effectively distinguish words with respect to different relations without introducing unnecessary sparseness.
arXiv Detail & Related papers (2020-01-09T04:47:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.