Word Embedding-based Text Processing for Comprehensive Summarization and
Distinct Information Extraction
- URL: http://arxiv.org/abs/2004.09719v1
- Date: Tue, 21 Apr 2020 02:43:31 GMT
- Title: Word Embedding-based Text Processing for Comprehensive Summarization and
Distinct Information Extraction
- Authors: Xiangpeng Wan, Hakim Ghazzai, and Yehia Massoud
- Abstract summary: We propose two automated text processing frameworks specifically designed to analyze online reviews.
The first framework is to summarize the reviews dataset by extracting essential sentence.
The second framework is based on a question-answering neural network model trained to extract answers to multiple different questions.
- Score: 1.552282932199974
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose two automated text processing frameworks
specifically designed to analyze online reviews. The objective of the first
framework is to summarize the reviews dataset by extracting essential sentence.
This is performed by converting sentences into numerical vectors and clustering
them using a community detection algorithm based on their similarity levels.
Afterwards, a correlation score is measured for each sentence to determine its
importance level in each cluster and assign it as a tag for that community. The
second framework is based on a question-answering neural network model trained
to extract answers to multiple different questions. The collected answers are
effectively clustered to find multiple distinct answers to a single question
that might be asked by a customer. The proposed frameworks are shown to be more
comprehensive than existing reviews processing solutions.
Related papers
- JADS: A Framework for Self-supervised Joint Aspect Discovery and Summarization [3.992091862806936]
Our solution integrates topic discovery and summarization into a single step.
Given text data, our Joint Aspect Discovery and Summarization algorithm (JADS) discovers aspects from the input.
Our proposed method achieves higher semantic alignment with ground truth and is factual.
arXiv Detail & Related papers (2024-05-28T23:01:57Z) - Text Summarization with Oracle Expectation [88.39032981994535]
Extractive summarization produces summaries by identifying and concatenating the most important sentences in a document.
Most summarization datasets do not come with gold labels indicating whether document sentences are summary-worthy.
We propose a simple yet effective labeling algorithm that creates soft, expectation-based sentence labels.
arXiv Detail & Related papers (2022-09-26T14:10:08Z) - Providing Insights for Open-Response Surveys via End-to-End
Context-Aware Clustering [2.6094411360258185]
In this work, we present a novel end-to-end context-aware framework that extracts, aggregates, and abbreviates embedded semantic patterns in open-response survey data.
Our framework relies on a pre-trained natural language model in order to encode the textual data into semantic vectors.
Our framework reduces the costs at-scale by automating the process of extracting the most insightful information pieces from survey data.
arXiv Detail & Related papers (2022-03-02T18:24:10Z) - Unsupervised Summarization for Chat Logs with Topic-Oriented Ranking and
Context-Aware Auto-Encoders [59.038157066874255]
We propose a novel framework called RankAE to perform chat summarization without employing manually labeled data.
RankAE consists of a topic-oriented ranking strategy that selects topic utterances according to centrality and diversity simultaneously.
A denoising auto-encoder is designed to generate succinct but context-informative summaries based on the selected utterances.
arXiv Detail & Related papers (2020-12-14T07:31:17Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - ClarQ: A large-scale and diverse dataset for Clarification Question
Generation [67.1162903046619]
We devise a novel bootstrapping framework that assists in the creation of a diverse, large-scale dataset of clarification questions based on postcomments extracted from stackexchange.
We quantitatively demonstrate the utility of the newly created dataset by applying it to the downstream task of question-answering.
We release this dataset in order to foster research into the field of clarification question generation with the larger goal of enhancing dialog and question answering systems.
arXiv Detail & Related papers (2020-06-10T17:56:50Z) - Context-based Transformer Models for Answer Sentence Selection [109.96739477808134]
In this paper, we analyze the role of the contextual information in the sentence selection task.
We propose a Transformer based architecture that leverages two types of contexts, local and global.
The results show that the combination of local and global contexts in a Transformer model significantly improves the accuracy in Answer Sentence Selection.
arXiv Detail & Related papers (2020-06-01T21:52:19Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.