Related papers: Using virtual edges to extract keywords from texts modeled as complex networks

Using virtual edges to extract keywords from texts modeled as complex networks

URL: http://arxiv.org/abs/2205.02172v1
Date: Wed, 4 May 2022 16:43:03 GMT
Title: Using virtual edges to extract keywords from texts modeled as complex networks
Authors: Jorge A. V. Tohalino and Thiago C. Silva and Diego R. Amancio
Abstract summary: We modeled texts co-occurrence networks, where nodes are words and edges are established by contextual or semantical similarity. We found that, in fact, the use of virtual edges can improve the discriminability of co-occurrence networks.
Score: 0.1611401281366893
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Detecting keywords in texts is important for many text mining applications. Graph-based methods have been commonly used to automatically find the key concepts in texts, however, relevant information provided by embeddings has not been widely used to enrich the graph structure. Here we modeled texts co-occurrence networks, where nodes are words and edges are established either by contextual or semantical similarity. We compared two embedding approaches -- Word2vec and BERT -- to check whether edges created via word embeddings can improve the quality of the keyword extraction method. We found that, in fact, the use of virtual edges can improve the discriminability of co-occurrence networks. The best performance was obtained when we considered low percentages of addition of virtual (embedding) edges. A comparative analysis of structural and dynamical network metrics revealed the degree, PageRank, and accessibility are the metrics displaying the best performance in the model enriched with virtual edges.

Related papers

A Survey on Deep Text Hashing: Efficient Semantic Text Retrieval with Binary Representation [69.50397417361351]
Text hashing projects original texts into compact binary hash codes.<n>Deep text hashing has demonstrated significant advantages over traditional, data-independent hashing techniques.<n>This survey investigates current deep text hashing methods by categorizing them based on their core components.
arXiv Detail & Related papers (2025-10-31T06:51:37Z)
Probing the statistical properties of enriched co-occurrence networks [0.0]
This study investigates two key statistical properties of text-based network models. We show that incorporating virtual edges can have positive and negative effects, depending on the specific network metric. Our results can serve as a guideline for determining which network metrics are most appropriate for specific applications.
arXiv Detail & Related papers (2024-12-03T18:38:14Z)
Attention based End to end network for Offline Writer Identification on Word level data [3.5829161769306244]
We propose a writer identification system based on an attention-driven Convolutional Neural Network (CNN) The system is trained utilizing image segments, known as fragments, extracted from word images, employing a pyramid-based strategy. The efficacy of the proposed algorithm is evaluated on three benchmark databases.
arXiv Detail & Related papers (2024-04-11T09:41:14Z)
ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer [88.61312640540902]
We introduce Explicit Synergy-based Text Spotting Transformer framework (ESTextSpotter) Our model achieves explicit synergy by modeling discriminative and interactive features for text detection and recognition within a single decoder. Experimental results demonstrate that our model significantly outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2023-08-20T03:22:23Z)
Edgeformers: Graph-Empowered Transformers for Representation Learning on Textual-Edge Networks [30.49672654211631]
Edgeformers is a framework built upon graph-enhanced Transformers to perform edge and node representation learning. We show that Edgeformers consistently outperform state-of-the-art baselines in edge classification and link prediction.
arXiv Detail & Related papers (2023-02-21T23:09:17Z)
TeKo: Text-Rich Graph Neural Networks with External Knowledge [75.91477450060808]
We propose a novel text-rich graph neural network with external knowledge (TeKo) We first present a flexible heterogeneous semantic network that incorporates high-quality entities. We then introduce two types of external knowledge, that is, structured triplets and unstructured entity description.
arXiv Detail & Related papers (2022-06-15T02:33:10Z)
StrokeNet: Stroke Assisted and Hierarchical Graph Reasoning Networks [31.76016966100244]
StrokeNet is proposed to effectively detect the texts by capturing the fine-grained strokes. Different from existing approaches that represent the text area by a series of points or rectangular boxes, we directly localize strokes of each text instance.
arXiv Detail & Related papers (2021-11-23T08:26:42Z)
Minimally-Supervised Structure-Rich Text Categorization via Learning on Text-Rich Networks [61.23408995934415]
We propose a novel framework for minimally supervised categorization by learning from the text-rich network. Specifically, we jointly train two modules with different inductive biases -- a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning. Our experiments show that given only three seed documents per category, our framework can achieve an accuracy of about 92%.
arXiv Detail & Related papers (2021-02-23T04:14:34Z)
Adversarial Context Aware Network Embeddings for Textual Networks [8.680676599607123]
Existing approaches learn embeddings of text and network structure by enforcing embeddings of connected nodes to be similar. This implies that these approaches require edge information for learning embeddings and they cannot learn embeddings of unseen nodes. We propose an approach that achieves both modality fusion and the capability to learn embeddings of unseen nodes.
arXiv Detail & Related papers (2020-11-05T05:20:01Z)
Be More with Less: Hypergraph Attention Networks for Inductive Text Classification [56.98218530073927]
Graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task. Despite the success, their performance could be largely jeopardized in practice since they are unable to capture high-order interaction between words. We propose a principled model -- hypergraph attention networks (HyperGAT) which can obtain more expressive power with less computational consumption for text representation learning.
arXiv Detail & Related papers (2020-11-01T00:21:59Z)
Keyphrase Extraction with Dynamic Graph Convolutional Networks and Diversified Inference [50.768682650658384]
Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document. Recent Sequence-to-Sequence (Seq2Seq) based generative framework is widely used in KE task, and it has obtained competitive performance on various benchmarks. In this paper, we propose to adopt the Dynamic Graph Convolutional Networks (DGCN) to solve the above two problems simultaneously.
arXiv Detail & Related papers (2020-10-24T08:11:23Z)
Improving Image Captioning with Better Use of Captions [65.39641077768488]
We present a novel image captioning architecture to better explore semantics available in captions and leverage that to enhance both image representation and caption generation. Our models first construct caption-guided visual relationship graphs that introduce beneficial inductive bias using weakly supervised multi-instance learning. During generation, the model further incorporates visual relationships using multi-task learning for jointly predicting word and object/predicate tag sequences.
arXiv Detail & Related papers (2020-06-21T14:10:47Z)
Using word embeddings to improve the discriminability of co-occurrence text networks [0.1611401281366893]
We investigate whether the use of word embeddings as a tool to create virtual links in co-occurrence networks may improve the quality of classification systems. Our results revealed that the discriminability in the stylometry task is improved when using Glove, Word2Vec and FastText.
arXiv Detail & Related papers (2020-03-13T13:35:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.