Graph-based Semantical Extractive Text Analysis
- URL: http://arxiv.org/abs/2212.09701v1
- Date: Mon, 19 Dec 2022 18:30:26 GMT
- Title: Graph-based Semantical Extractive Text Analysis
- Authors: Mina Samizadeh
- Abstract summary: In this work, we improve the results of the TextRank algorithm by incorporating the semantic similarity between parts of the text.
Aside from keyword extraction and text summarization, we develop a topic clustering algorithm based on our framework.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the past few decades, there has been an explosion in the amount of
available data produced from various sources with different topics. The
availability of this enormous data necessitates us to adopt effective
computational tools to explore the data. This leads to an intense growing
interest in the research community to develop computational methods focused on
processing this text data. A line of study focused on condensing the text so
that we are able to get a higher level of understanding in a shorter time. The
two important tasks to do this are keyword extraction and text summarization.
In keyword extraction, we are interested in finding the key important words
from a text. This makes us familiar with the general topic of a text. In text
summarization, we are interested in producing a short-length text which
includes important information about the document. The TextRank algorithm, an
unsupervised learning method that is an extension of the PageRank (algorithm
which is the base algorithm of Google search engine for searching pages and
ranking them) has shown its efficacy in large-scale text mining, especially for
text summarization and keyword extraction. this algorithm can automatically
extract the important parts of a text (keywords or sentences) and declare them
as the result. However, this algorithm neglects the semantic similarity between
the different parts. In this work, we improved the results of the TextRank
algorithm by incorporating the semantic similarity between parts of the text.
Aside from keyword extraction and text summarization, we develop a topic
clustering algorithm based on our framework which can be used individually or
as a part of generating the summary to overcome coverage problems.
Related papers
- Automatic summarisation of Instagram social network posts Combining
semantic and statistical approaches [0.0]
A crawler has been developed to extract popular text posts from the Instagram social network with appropriate preprocessing.
Observations made on 820 popular text posts on the social network Instagram show the accuracy (80%) of the proposed system.
arXiv Detail & Related papers (2023-03-14T14:59:20Z) - Uzbek text summarization based on TF-IDF [0.0]
This article presents an experiment on summarization task for Uzbek language.
The methodology was based on text abstracting based on TF-IDF algorithm.
We summarize the given text by applying the n-gram method to important parts of the whole text.
arXiv Detail & Related papers (2023-03-01T12:39:46Z) - TRIE++: Towards End-to-End Information Extraction from Visually Rich
Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents.
Text reading and information extraction can reinforce each other via a well-designed multi-modal context block.
The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z) - Information Retrieval in Friction Stir Welding of Aluminum Alloys by
using Natural Language Processing based Algorithms [0.0]
Text summarization is a technique for condensing a big piece of text into a few key elements that give a general impression of the content.
Natural Language Processing (NLP) is the sub-division of Artificial Intelligence that narrows down the gap between technology and human cognition.
arXiv Detail & Related papers (2022-04-25T16:36:00Z) - Match-Ignition: Plugging PageRank into Transformer for Long-form Text
Matching [66.71886789848472]
We propose a novel hierarchical noise filtering model, namely Match-Ignition, to tackle the effectiveness and efficiency problem.
The basic idea is to plug the well-known PageRank algorithm into the Transformer, to identify and filter both sentence and word level noisy information.
Noisy sentences are usually easy to detect because the sentence is the basic unit of a long-form text, so we directly use PageRank to filter such information.
arXiv Detail & Related papers (2021-01-16T10:34:03Z) - Accelerating Text Mining Using Domain-Specific Stop Word Lists [57.76576681191192]
We present a novel approach for the automatic extraction of domain-specific words called the hyperplane-based approach.
The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features.
Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information.
arXiv Detail & Related papers (2020-11-18T17:42:32Z) - Biased TextRank: Unsupervised Graph-Based Content Extraction [26.54218341713572]
Biased TextRank is a graph-based content extraction method inspired by the popular TextRank algorithm.
We present two applications of Biased TextRank: focused summarization and explanation extraction.
arXiv Detail & Related papers (2020-11-02T15:17:44Z) - BATS: A Spectral Biclustering Approach to Single Document Topic Modeling
and Segmentation [17.003488045214972]
Existing topic modeling and text segmentation methodologies generally require large datasets for training, limiting their capabilities when only small collections of text are available.
In developing a methodology to handle single documents, we face two major challenges.
First is sparse information: with access to only one document, we cannot train traditional topic models or deep learning algorithms.
Second is significant noise: a considerable portion of words in any single document will produce only noise and not help discern topics or segments.
arXiv Detail & Related papers (2020-08-05T16:34:33Z) - TRIE: End-to-End Text Reading and Information Extraction for Document
Understanding [56.1416883796342]
We propose a unified end-to-end text reading and information extraction network.
multimodal visual and textual features of text reading are fused for information extraction.
Our proposed method significantly outperforms the state-of-the-art methods in both efficiency and accuracy.
arXiv Detail & Related papers (2020-05-27T01:47:26Z) - From Standard Summarization to New Tasks and Beyond: Summarization with
Manifold Information [77.89755281215079]
Text summarization is the research area aiming at creating a short and condensed version of the original document.
In real-world applications, most of the data is not in a plain text format.
This paper focuses on the survey of these new summarization tasks and approaches in the real-world application.
arXiv Detail & Related papers (2020-05-10T14:59:36Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.