Massive Multi-Document Summarization of Product Reviews with Weak
Supervision
- URL: http://arxiv.org/abs/2007.11348v1
- Date: Wed, 22 Jul 2020 11:22:57 GMT
- Title: Massive Multi-Document Summarization of Product Reviews with Weak
Supervision
- Authors: Ori Shapira and Ran Levy
- Abstract summary: Product reviews summarization is a type of Multi-Document Summarization (MDS) task.
We show that summarizing small samples of the reviews can result in loss of important information.
We propose a schema for summarizing a massive set of reviews on top of a standard summarization algorithm.
- Score: 11.462916848094403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Product reviews summarization is a type of Multi-Document Summarization (MDS)
task in which the summarized document sets are often far larger than in
traditional MDS (up to tens of thousands of reviews). We highlight this
difference and coin the term "Massive Multi-Document Summarization" (MMDS) to
denote an MDS task that involves hundreds of documents or more. Prior work on
product reviews summarization considered small samples of the reviews, mainly
due to the difficulty of handling massive document sets. We show that
summarizing small samples can result in loss of important information and
provide misleading evaluation results. We propose a schema for summarizing a
massive set of reviews on top of a standard summarization algorithm. Since
writing large volumes of reference summaries needed for advanced neural network
models is impractical, our solution relies on weak supervision. Finally, we
propose an evaluation scheme that is based on multiple crowdsourced reference
summaries and aims to capture the massive review collection. We show that an
initial implementation of our schema significantly improves over several
baselines in ROUGE scores, and exhibits strong coherence in a manual linguistic
quality assessment.
Related papers
- Unsupervised Multi-document Summarization with Holistic Inference [41.58777650517525]
This paper proposes a new holistic framework for unsupervised multi-document extractive summarization.
Subset Representative Index (SRI) balances the importance and diversity of a subset of sentences from the source documents.
Our findings suggest that diversity is essential for improving multi-document summary performance.
arXiv Detail & Related papers (2023-09-08T02:56:30Z) - How Far are We from Robust Long Abstractive Summarization? [39.34743996451813]
We evaluate long document abstractive summarization systems (i.e., models and metrics) with the aim of implementing them to generate reliable summaries.
For long document evaluation metrics, human evaluation results show that ROUGE remains the best at evaluating the relevancy of a summary.
We release our annotated long document dataset with the hope that it can contribute to the development of metrics across a broader range of summarization settings.
arXiv Detail & Related papers (2022-10-30T03:19:50Z) - How "Multi" is Multi-Document Summarization? [15.574673241564932]
It is expected that both reference summaries in MDS datasets, as well as system summaries, would indeed be based on dispersed information.
We propose an automated measure for evaluating the degree to which a summary is disperse''
Our results show that certain MDS datasets barely require combining information from multiple documents, where a single document often covers the full summary content.
arXiv Detail & Related papers (2022-10-23T10:20:09Z) - Unsupervised Summarization with Customized Granularities [76.26899748972423]
We propose the first unsupervised multi-granularity summarization framework, GranuSum.
By inputting different numbers of events, GranuSum is capable of producing multi-granular summaries in an unsupervised manner.
arXiv Detail & Related papers (2022-01-29T05:56:35Z) - Learning Opinion Summarizers by Selecting Informative Reviews [81.47506952645564]
We collect a large dataset of summaries paired with user reviews for over 31,000 products, enabling supervised training.
The content of many reviews is not reflected in the human-written summaries, and, thus, the summarizer trained on random review subsets hallucinates.
We formulate the task as jointly learning to select informative subsets of reviews and summarizing the opinions expressed in these subsets.
arXiv Detail & Related papers (2021-09-09T15:01:43Z) - An Enhanced MeanSum Method For Generating Hotel Multi-Review
Summarizations [0.06091702876917279]
This work uses Multi-Aspect Masker(MAM) as content selector to address the issue with multi-aspect.
We also propose a regularizer to control the length of the generated summaries.
Our improved model achieves higher ROUGE, Sentiment Accuracy than the original Meansum method.
arXiv Detail & Related papers (2020-12-07T13:16:01Z) - SupMMD: A Sentence Importance Model for Extractive Summarization using
Maximum Mean Discrepancy [92.5683788430012]
SupMMD is a novel technique for generic and update summarization based on the maximum discrepancy from kernel two-sample testing.
We show the efficacy of SupMMD in both generic and update summarization tasks by meeting or exceeding the current state-of-the-art on the DUC-2004 and TAC-2009 datasets.
arXiv Detail & Related papers (2020-10-06T09:26:55Z) - Corpora Evaluation and System Bias Detection in Multi-document
Summarization [25.131744693121508]
Multi-document summarization (MDS) is the task of reflecting key points from any set of documents into a concise text paragraph.
Owing to no standard definition of the task, we encounter a plethora of datasets with varying levels of overlap and conflict between participating documents.
New systems report results on a set of chosen datasets, which might not correlate with their performance on the other datasets.
arXiv Detail & Related papers (2020-10-05T05:25:43Z) - Unsupervised Reference-Free Summary Quality Evaluation via Contrastive
Learning [66.30909748400023]
We propose to evaluate the summary qualities without reference summaries by unsupervised contrastive learning.
Specifically, we design a new metric which covers both linguistic qualities and semantic informativeness based on BERT.
Experiments on Newsroom and CNN/Daily Mail demonstrate that our new evaluation method outperforms other metrics even without reference summaries.
arXiv Detail & Related papers (2020-10-05T05:04:14Z) - SummPip: Unsupervised Multi-Document Summarization with Sentence Graph
Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization.
We convert the original documents to a sentence graph, taking both linguistic and deep representation into account.
We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z) - Unsupervised Opinion Summarization with Noising and Denoising [85.49169453434554]
We create a synthetic dataset from a corpus of user reviews by sampling a review, pretending it is a summary, and generating noisy versions thereof.
At test time, the model accepts genuine reviews and generates a summary containing salient opinions, treating those that do not reach consensus as noise.
arXiv Detail & Related papers (2020-04-21T16:54:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.