Using virtual edges to extract keywords from texts modeled as complex
  networks
        - URL: http://arxiv.org/abs/2205.02172v1
- Date: Wed, 4 May 2022 16:43:03 GMT
- Title: Using virtual edges to extract keywords from texts modeled as complex
  networks
- Authors: Jorge A. V. Tohalino and Thiago C. Silva and Diego R. Amancio
- Abstract summary: We modeled texts co-occurrence networks, where nodes are words and edges are established by contextual or semantical similarity.
We found that, in fact, the use of virtual edges can improve the discriminability of co-occurrence networks.
- Score: 0.1611401281366893
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Detecting keywords in texts is important for many text mining applications.
Graph-based methods have been commonly used to automatically find the key
concepts in texts, however, relevant information provided by embeddings has not
been widely used to enrich the graph structure. Here we modeled texts
co-occurrence networks, where nodes are words and edges are established either
by contextual or semantical similarity. We compared two embedding approaches --
Word2vec and BERT -- to check whether edges created via word embeddings can
improve the quality of the keyword extraction method. We found that, in fact,
the use of virtual edges can improve the discriminability of co-occurrence
networks. The best performance was obtained when we considered low percentages
of addition of virtual (embedding) edges. A comparative analysis of structural
and dynamical network metrics revealed the degree, PageRank, and accessibility
are the metrics displaying the best performance in the model enriched with
virtual edges.
 
      
        Related papers
        - Probing the statistical properties of enriched co-occurrence networks [0.0]
 This study investigates two key statistical properties of text-based network models.
We show that incorporating virtual edges can have positive and negative effects, depending on the specific network metric.
Our results can serve as a guideline for determining which network metrics are most appropriate for specific applications.
 arXiv  Detail & Related papers  (2024-12-03T18:38:14Z)
- Attention based End to end network for Offline Writer Identification on   Word level data [3.5829161769306244]
 We propose a writer identification system based on an attention-driven Convolutional Neural Network (CNN)
The system is trained utilizing image segments, known as fragments, extracted from word images, employing a pyramid-based strategy.
The efficacy of the proposed algorithm is evaluated on three benchmark databases.
 arXiv  Detail & Related papers  (2024-04-11T09:41:14Z)
- ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy
  in Transformer [88.61312640540902]
 We introduce Explicit Synergy-based Text Spotting Transformer framework (ESTextSpotter)
Our model achieves explicit synergy by modeling discriminative and interactive features for text detection and recognition within a single decoder.
 Experimental results demonstrate that our model significantly outperforms previous state-of-the-art methods.
 arXiv  Detail & Related papers  (2023-08-20T03:22:23Z)
- Edgeformers: Graph-Empowered Transformers for Representation Learning on
  Textual-Edge Networks [30.49672654211631]
 Edgeformers is a framework built upon graph-enhanced Transformers to perform edge and node representation learning.
We show that Edgeformers consistently outperform state-of-the-art baselines in edge classification and link prediction.
 arXiv  Detail & Related papers  (2023-02-21T23:09:17Z)
- TeKo: Text-Rich Graph Neural Networks with External Knowledge [75.91477450060808]
 We propose a novel text-rich graph neural network with external knowledge (TeKo)
We first present a flexible heterogeneous semantic network that incorporates high-quality entities.
We then introduce two types of external knowledge, that is, structured triplets and unstructured entity description.
 arXiv  Detail & Related papers  (2022-06-15T02:33:10Z)
- StrokeNet: Stroke Assisted and Hierarchical Graph Reasoning Networks [31.76016966100244]
 StrokeNet is proposed to effectively detect the texts by capturing the fine-grained strokes.
Different from existing approaches that represent the text area by a series of points or rectangular boxes, we directly localize strokes of each text instance.
 arXiv  Detail & Related papers  (2021-11-23T08:26:42Z)
- Minimally-Supervised Structure-Rich Text Categorization via Learning on
  Text-Rich Networks [61.23408995934415]
 We propose a novel framework for minimally supervised categorization by learning from the text-rich network.
Specifically, we jointly train two modules with different inductive biases -- a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning.
Our experiments show that given only three seed documents per category, our framework can achieve an accuracy of about 92%.
 arXiv  Detail & Related papers  (2021-02-23T04:14:34Z)
- Adversarial Context Aware Network Embeddings for Textual Networks [8.680676599607123]
 Existing approaches learn embeddings of text and network structure by enforcing embeddings of connected nodes to be similar.
This implies that these approaches require edge information for learning embeddings and they cannot learn embeddings of unseen nodes.
We propose an approach that achieves both modality fusion and the capability to learn embeddings of unseen nodes.
 arXiv  Detail & Related papers  (2020-11-05T05:20:01Z)
- Be More with Less: Hypergraph Attention Networks for Inductive Text
  Classification [56.98218530073927]
 Graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task.
Despite the success, their performance could be largely jeopardized in practice since they are unable to capture high-order interaction between words.
We propose a principled model -- hypergraph attention networks (HyperGAT) which can obtain more expressive power with less computational consumption for text representation learning.
 arXiv  Detail & Related papers  (2020-11-01T00:21:59Z)
- Keyphrase Extraction with Dynamic Graph Convolutional Networks and
  Diversified Inference [50.768682650658384]
 Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document.
Recent Sequence-to-Sequence (Seq2Seq) based generative framework is widely used in KE task, and it has obtained competitive performance on various benchmarks.
In this paper, we propose to adopt the Dynamic Graph Convolutional Networks (DGCN) to solve the above two problems simultaneously.
 arXiv  Detail & Related papers  (2020-10-24T08:11:23Z)
- Improving Image Captioning with Better Use of Captions [65.39641077768488]
 We present a novel image captioning architecture to better explore semantics available in captions and leverage that to enhance both image representation and caption generation.
Our models first construct caption-guided visual relationship graphs that introduce beneficial inductive bias using weakly supervised multi-instance learning.
During generation, the model further incorporates visual relationships using multi-task learning for jointly predicting word and object/predicate tag sequences.
 arXiv  Detail & Related papers  (2020-06-21T14:10:47Z)
- Using word embeddings to improve the discriminability of co-occurrence
  text networks [0.1611401281366893]
 We investigate whether the use of word embeddings as a tool to create virtual links in co-occurrence networks may improve the quality of classification systems.
Our results revealed that the discriminability in the stylometry task is improved when using Glove, Word2Vec and FastText.
 arXiv  Detail & Related papers  (2020-03-13T13:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.