Enhanced word embeddings using multi-semantic representation through
lexical chains
- URL: http://arxiv.org/abs/2101.09023v1
- Date: Fri, 22 Jan 2021 09:43:33 GMT
- Title: Enhanced word embeddings using multi-semantic representation through
lexical chains
- Authors: Terry Ruas, Charles Henrique Porto Ferreira, William Grosky,
Fabr\'icio Olivetti de Fran\c{c}a, D\'ebora Maria Rossi Medeiros
- Abstract summary: We propose two novel algorithms, called Flexible Lexical Chain II and Fixed Lexical Chain II.
These algorithms combine the semantic relations derived from lexical chains, prior knowledge from lexical databases, and the robustness of the distributional hypothesis in word embeddings as building blocks forming a single system.
Our results show the integration between lexical chains and word embeddings representations sustain state-of-the-art results, even against more complex systems.
- Score: 1.8199326045904998
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The relationship between words in a sentence often tells us more about the
underlying semantic content of a document than its actual words, individually.
In this work, we propose two novel algorithms, called Flexible Lexical Chain II
and Fixed Lexical Chain II. These algorithms combine the semantic relations
derived from lexical chains, prior knowledge from lexical databases, and the
robustness of the distributional hypothesis in word embeddings as building
blocks forming a single system. In short, our approach has three main
contributions: (i) a set of techniques that fully integrate word embeddings and
lexical chains; (ii) a more robust semantic representation that considers the
latent relation between words in a document; and (iii) lightweight word
embeddings models that can be extended to any natural language task. We intend
to assess the knowledge of pre-trained models to evaluate their robustness in
the document classification task. The proposed techniques are tested against
seven word embeddings algorithms using five different machine learning
classifiers over six scenarios in the document classification task. Our results
show the integration between lexical chains and word embeddings representations
sustain state-of-the-art results, even against more complex systems.
Related papers
- Unifying Latent and Lexicon Representations for Effective Video-Text
Retrieval [87.69394953339238]
We propose the UNIFY framework, which learns lexicon representations to capture fine-grained semantics in video-text retrieval.
We show our framework largely outperforms previous video-text retrieval methods, with 4.8% and 8.2% Recall@1 improvement on MSR-VTT and DiDeMo respectively.
arXiv Detail & Related papers (2024-02-26T17:36:50Z) - Domain Embeddings for Generating Complex Descriptions of Concepts in
Italian Language [65.268245109828]
We propose a Distributional Semantic resource enriched with linguistic and lexical information extracted from electronic dictionaries.
The resource comprises 21 domain-specific matrices, one comprehensive matrix, and a Graphical User Interface.
Our model facilitates the generation of reasoned semantic descriptions of concepts by selecting matrices directly associated with concrete conceptual knowledge.
arXiv Detail & Related papers (2024-02-26T15:04:35Z) - Towards Unsupervised Recognition of Token-level Semantic Differences in
Related Documents [61.63208012250885]
We formulate recognizing semantic differences as a token-level regression task.
We study three unsupervised approaches that rely on a masked language model.
Our results show that an approach based on word alignment and sentence-level contrastive learning has a robust correlation to gold labels.
arXiv Detail & Related papers (2023-05-22T17:58:04Z) - A Comprehensive Empirical Evaluation of Existing Word Embedding
Approaches [5.065947993017158]
We present the characteristics of existing word embedding approaches and analyze them with regard to many classification tasks.
Traditional approaches mostly use matrix factorization to produce word representations, and they are not able to capture the semantic and syntactic regularities of the language very well.
On the other hand, Neural-network-based approaches can capture sophisticated regularities of the language and preserve the word relationships in the generated word representations.
arXiv Detail & Related papers (2023-03-13T15:34:19Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - DyLex: Incorporating Dynamic Lexicons into BERT for Sequence Labeling [49.3379730319246]
We propose DyLex, a plug-in lexicon incorporation approach for BERT based sequence labeling tasks.
We adopt word-agnostic tag embeddings to avoid re-training the representation while updating the lexicon.
Finally, we introduce a col-wise attention based knowledge fusion mechanism to guarantee the pluggability of the proposed framework.
arXiv Detail & Related papers (2021-09-18T03:15:49Z) - LexSubCon: Integrating Knowledge from Lexical Resources into Contextual
Embeddings for Lexical Substitution [76.615287796753]
We introduce LexSubCon, an end-to-end lexical substitution framework based on contextual embedding models.
This is achieved by combining contextual information with knowledge from structured lexical resources.
Our experiments show that LexSubCon outperforms previous state-of-the-art methods on LS07 and CoInCo benchmark datasets.
arXiv Detail & Related papers (2021-07-11T21:25:56Z) - A comprehensive empirical analysis on cross-domain semantic enrichment
for detection of depressive language [0.9749560288448115]
We start with a rich word embedding pre-trained from a large general dataset, which is then augmented with embeddings learned from a much smaller and more specific domain dataset through a simple non-linear mapping mechanism.
We show that our augmented word embedding representations achieve a significantly better F1 score than the others, specially when applied to a high quality dataset.
arXiv Detail & Related papers (2021-06-24T07:15:09Z) - Top2Vec: Distributed Representations of Topics [0.0]
Topic modeling is used for discovering latent semantic structure, usually referred to as topics, in a large collection of documents.
We present $texttttop2vec$, which leverages joint document and word semantic embedding to find topics.
Our experiments demonstrate that $texttttop2vec$ finds topics which are significantly more informative and representative of the corpus trained on than probabilistic generative models.
arXiv Detail & Related papers (2020-08-19T20:58:27Z) - Comparative Analysis of Word Embeddings for Capturing Word Similarities [0.0]
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks.
Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings.
selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.
arXiv Detail & Related papers (2020-05-08T01:16:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.