Related papers: Graph-based Semantical Extractive Text Analysis

Graph-based Semantical Extractive Text Analysis

URL: http://arxiv.org/abs/2212.09701v1
Date: Mon, 19 Dec 2022 18:30:26 GMT
Title: Graph-based Semantical Extractive Text Analysis
Authors: Mina Samizadeh
Abstract summary: In this work, we improve the results of the TextRank algorithm by incorporating the semantic similarity between parts of the text. Aside from keyword extraction and text summarization, we develop a topic clustering algorithm based on our framework.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the past few decades, there has been an explosion in the amount of available data produced from various sources with different topics. The availability of this enormous data necessitates us to adopt effective computational tools to explore the data. This leads to an intense growing interest in the research community to develop computational methods focused on processing this text data. A line of study focused on condensing the text so that we are able to get a higher level of understanding in a shorter time. The two important tasks to do this are keyword extraction and text summarization. In keyword extraction, we are interested in finding the key important words from a text. This makes us familiar with the general topic of a text. In text summarization, we are interested in producing a short-length text which includes important information about the document. The TextRank algorithm, an unsupervised learning method that is an extension of the PageRank (algorithm which is the base algorithm of Google search engine for searching pages and ranking them) has shown its efficacy in large-scale text mining, especially for text summarization and keyword extraction. this algorithm can automatically extract the important parts of a text (keywords or sentences) and declare them as the result. However, this algorithm neglects the semantic similarity between the different parts. In this work, we improved the results of the TextRank algorithm by incorporating the semantic similarity between parts of the text. Aside from keyword extraction and text summarization, we develop a topic clustering algorithm based on our framework which can be used individually or as a part of generating the summary to overcome coverage problems.

Related papers

A Survey on Deep Text Hashing: Efficient Semantic Text Retrieval with Binary Representation [69.50397417361351]
Text hashing projects original texts into compact binary hash codes.<n>Deep text hashing has demonstrated significant advantages over traditional, data-independent hashing techniques.<n>This survey investigates current deep text hashing methods by categorizing them based on their core components.
arXiv Detail & Related papers (2025-10-31T06:51:37Z)
Automatic summarisation of Instagram social network posts Combining semantic and statistical approaches [0.0]
A crawler has been developed to extract popular text posts from the Instagram social network with appropriate preprocessing. Observations made on 820 popular text posts on the social network Instagram show the accuracy (80%) of the proposed system.
arXiv Detail & Related papers (2023-03-14T14:59:20Z)
Uzbek text summarization based on TF-IDF [0.0]
This article presents an experiment on summarization task for Uzbek language. The methodology was based on text abstracting based on TF-IDF algorithm. We summarize the given text by applying the n-gram method to important parts of the whole text.
arXiv Detail & Related papers (2023-03-01T12:39:46Z)
TRIE++: Towards End-to-End Information Extraction from Visually Rich Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents. Text reading and information extraction can reinforce each other via a well-designed multi-modal context block. The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z)
Information Retrieval in Friction Stir Welding of Aluminum Alloys by using Natural Language Processing based Algorithms [0.0]
Text summarization is a technique for condensing a big piece of text into a few key elements that give a general impression of the content. Natural Language Processing (NLP) is the sub-division of Artificial Intelligence that narrows down the gap between technology and human cognition.
arXiv Detail & Related papers (2022-04-25T16:36:00Z)
Match-Ignition: Plugging PageRank into Transformer for Long-form Text Matching [66.71886789848472]
We propose a novel hierarchical noise filtering model, namely Match-Ignition, to tackle the effectiveness and efficiency problem. The basic idea is to plug the well-known PageRank algorithm into the Transformer, to identify and filter both sentence and word level noisy information. Noisy sentences are usually easy to detect because the sentence is the basic unit of a long-form text, so we directly use PageRank to filter such information.
arXiv Detail & Related papers (2021-01-16T10:34:03Z)
Accelerating Text Mining Using Domain-Specific Stop Word Lists [57.76576681191192]
We present a novel approach for the automatic extraction of domain-specific words called the hyperplane-based approach. The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features. Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information.
arXiv Detail & Related papers (2020-11-18T17:42:32Z)
Biased TextRank: Unsupervised Graph-Based Content Extraction [26.54218341713572]
Biased TextRank is a graph-based content extraction method inspired by the popular TextRank algorithm. We present two applications of Biased TextRank: focused summarization and explanation extraction.
arXiv Detail & Related papers (2020-11-02T15:17:44Z)
BATS: A Spectral Biclustering Approach to Single Document Topic Modeling and Segmentation [17.003488045214972]
Existing topic modeling and text segmentation methodologies generally require large datasets for training, limiting their capabilities when only small collections of text are available. In developing a methodology to handle single documents, we face two major challenges. First is sparse information: with access to only one document, we cannot train traditional topic models or deep learning algorithms. Second is significant noise: a considerable portion of words in any single document will produce only noise and not help discern topics or segments.
arXiv Detail & Related papers (2020-08-05T16:34:33Z)
TRIE: End-to-End Text Reading and Information Extraction for Document Understanding [56.1416883796342]
We propose a unified end-to-end text reading and information extraction network. multimodal visual and textual features of text reading are fused for information extraction. Our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.
arXiv Detail & Related papers (2020-05-27T01:47:26Z)
From Standard Summarization to New Tasks and Beyond: Summarization with Manifold Information [77.89755281215079]
Text summarization is the research area aiming at creating a short and condensed version of the original document. In real-world applications, most of the data is not in a plain text format. This paper focuses on the survey of these new summarization tasks and approaches in the real-world application.
arXiv Detail & Related papers (2020-05-10T14:59:36Z)
Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems. We formulate the extractive summarization task as a semantic text matching problem. We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.