Using word embeddings to improve the discriminability of co-occurrence
text networks
- URL: http://arxiv.org/abs/2003.06279v1
- Date: Fri, 13 Mar 2020 13:35:44 GMT
- Title: Using word embeddings to improve the discriminability of co-occurrence
text networks
- Authors: Laura V. C. Quispe and Jorge A. V. Tohalino and Diego R. Amancio
- Abstract summary: We investigate whether the use of word embeddings as a tool to create virtual links in co-occurrence networks may improve the quality of classification systems.
Our results revealed that the discriminability in the stylometry task is improved when using Glove, Word2Vec and FastText.
- Score: 0.1611401281366893
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Word co-occurrence networks have been employed to analyze texts both in the
practical and theoretical scenarios. Despite the relative success in several
applications, traditional co-occurrence networks fail in establishing links
between similar words whenever they appear distant in the text. Here we
investigate whether the use of word embeddings as a tool to create virtual
links in co-occurrence networks may improve the quality of classification
systems. Our results revealed that the discriminability in the stylometry task
is improved when using Glove, Word2Vec and FastText. In addition, we found that
optimized results are obtained when stopwords are not disregarded and a simple
global thresholding strategy is used to establish virtual links. Because the
proposed approach is able to improve the representation of texts as complex
networks, we believe that it could be extended to study other natural language
processing tasks. Likewise, theoretical languages studies could benefit from
the adopted enriched representation of word co-occurrence networks.
Related papers
- ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy
in Transformer [88.61312640540902]
We introduce Explicit Synergy-based Text Spotting Transformer framework (ESTextSpotter)
Our model achieves explicit synergy by modeling discriminative and interactive features for text detection and recognition within a single decoder.
Experimental results demonstrate that our model significantly outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2023-08-20T03:22:23Z) - Topological properties and organizing principles of semantic networks [3.8462776107938317]
We study the properties of semantic networks from ConceptNet, defined by 7 semantic relations from 11 different languages.
We find that semantic networks have universal basic properties: they are sparse, highly clustered, and many exhibit power-law degree distributions.
In some networks the connections are similarity-based, while in others the connections are more complementarity-based.
arXiv Detail & Related papers (2023-04-24T11:12:21Z) - Disentangling Learnable and Memorizable Data via Contrastive Learning
for Semantic Communications [81.10703519117465]
A novel machine reasoning framework is proposed to disentangle source data so as to make it semantic-ready.
In particular, a novel contrastive learning framework is proposed, whereby instance and cluster discrimination are performed on the data.
Deep semantic clusters of highest confidence are considered learnable, semantic-rich data.
Our simulation results showcase the superiority of our contrastive learning approach in terms of semantic impact and minimalism.
arXiv Detail & Related papers (2022-12-18T12:00:12Z) - Less Data, More Knowledge: Building Next Generation Semantic
Communication Networks [180.82142885410238]
We present the first rigorous vision of a scalable end-to-end semantic communication network.
We first discuss how the design of semantic communication networks requires a move from data-driven networks towards knowledge-driven ones.
By using semantic representation and languages, we show that the traditional transmitter and receiver now become a teacher and apprentice.
arXiv Detail & Related papers (2022-11-25T19:03:25Z) - TeKo: Text-Rich Graph Neural Networks with External Knowledge [75.91477450060808]
We propose a novel text-rich graph neural network with external knowledge (TeKo)
We first present a flexible heterogeneous semantic network that incorporates high-quality entities.
We then introduce two types of external knowledge, that is, structured triplets and unstructured entity description.
arXiv Detail & Related papers (2022-06-15T02:33:10Z) - Using virtual edges to extract keywords from texts modeled as complex
networks [0.1611401281366893]
We modeled texts co-occurrence networks, where nodes are words and edges are established by contextual or semantical similarity.
We found that, in fact, the use of virtual edges can improve the discriminability of co-occurrence networks.
arXiv Detail & Related papers (2022-05-04T16:43:03Z) - Language Semantics Interpretation with an Interaction-based Recurrent
Neural Networks [0.0]
This paper proposes a novel influence score (I-score), a greedy search algorithm called Backward Dropping Algorithm (BDA), and a novel feature engineering technique called the "dagger technique"
The proposed methods are applied to improve prediction performance with an 81% error reduction comparing with other popular peers.
arXiv Detail & Related papers (2021-11-02T00:39:21Z) - Unsupervised Word Translation Pairing using Refinement based Point Set
Registration [8.568050813210823]
Cross-lingual alignment of word embeddings play an important role in knowledge transfer across languages.
Current unsupervised approaches rely on similarities in geometric structure of word embedding spaces across languages.
This paper proposes BioSpere, a novel framework for unsupervised mapping of bi-lingual word embeddings onto a shared vector space.
arXiv Detail & Related papers (2020-11-26T09:51:29Z) - Be More with Less: Hypergraph Attention Networks for Inductive Text
Classification [56.98218530073927]
Graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task.
Despite the success, their performance could be largely jeopardized in practice since they are unable to capture high-order interaction between words.
We propose a principled model -- hypergraph attention networks (HyperGAT) which can obtain more expressive power with less computational consumption for text representation learning.
arXiv Detail & Related papers (2020-11-01T00:21:59Z) - On Vocabulary Reliance in Scene Text Recognition [79.21737876442253]
Methods perform well on images with words within vocabulary but generalize poorly to images with words outside vocabulary.
We call this phenomenon "vocabulary reliance"
We propose a simple yet effective mutual learning strategy to allow models of two families to learn collaboratively.
arXiv Detail & Related papers (2020-05-08T11:16:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.