Automatically Evaluating Opinion Prevalence in Opinion Summarization
- URL: http://arxiv.org/abs/2307.14305v1
- Date: Wed, 26 Jul 2023 17:13:00 GMT
- Title: Automatically Evaluating Opinion Prevalence in Opinion Summarization
- Authors: Christopher Malon
- Abstract summary: We propose an automatic metric to test the prevalence of the opinions that a summary expresses.
We consider several existing methods to score the factual consistency of a summary statement.
We show that a human authored summary has only slightly better opinion prevalence than randomly selected extracts from the source reviews.
- Score: 0.9971537447334835
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: When faced with a large number of product reviews, it is not clear that a
human can remember all of them and weight opinions representatively to write a
good reference summary. We propose an automatic metric to test the prevalence
of the opinions that a summary expresses, based on counting the number of
reviews that are consistent with each statement in the summary, while
discrediting trivial or redundant statements. To formulate this opinion
prevalence metric, we consider several existing methods to score the factual
consistency of a summary statement with respect to each individual source
review. On a corpus of Amazon product reviews, we gather multiple human
judgments of the opinion consistency, to determine which automatic metric best
expresses consistency in product reviews. Using the resulting opinion
prevalence metric, we show that a human authored summary has only slightly
better opinion prevalence than randomly selected extracts from the source
reviews, and previous extractive and abstractive unsupervised opinion
summarization methods perform worse than humans. We demonstrate room for
improvement with a greedy construction of extractive summaries with twice the
opinion prevalence achieved by humans. Finally, we show that preprocessing
source reviews by simplification can raise the opinion prevalence achieved by
existing abstractive opinion summarization systems to the level of human
performance.
Related papers
- GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews [25.291384842659397]
We introduce sys, a summarization method designed to offer a concise yet comprehensive overview of scholarly reviews.
Unlike traditional consensus-based methods, sys extracts both common and unique opinions from the reviews.
arXiv Detail & Related papers (2024-06-11T15:27:01Z) - Rationale-based Opinion Summarization [23.39553692130953]
We propose a new paradigm for summarizing reviews, rationale-based opinion summarization.
We define four desirable properties: relatedness, specificity, popularity, and diversity.
To extract good rationales, we define four desirable properties: relatedness, specificity, popularity, and diversity.
arXiv Detail & Related papers (2024-03-30T02:22:57Z) - One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation [30.674896082482476]
We show that Op-I-Prompt emerges as a good alternative for evaluating opinion summaries achieving an average Spearman correlation of 0.70 with humans.
To the best of our knowledge, we are the first to investigate LLMs as evaluators on both closed-source and open-source models in the opinion summarization domain.
arXiv Detail & Related papers (2024-02-18T19:13:52Z) - Incremental Extractive Opinion Summarization Using Cover Trees [81.59625423421355]
In online marketplaces user reviews accumulate over time, and opinion summaries need to be updated periodically.
In this work, we study the task of extractive opinion summarization in an incremental setting.
We present an efficient algorithm for accurately computing the CentroidRank summaries in an incremental setting.
arXiv Detail & Related papers (2024-01-16T02:00:17Z) - Learning Opinion Summarizers by Selecting Informative Reviews [81.47506952645564]
We collect a large dataset of summaries paired with user reviews for over 31,000 products, enabling supervised training.
The content of many reviews is not reflected in the human-written summaries, and, thus, the summarizer trained on random review subsets hallucinates.
We formulate the task as jointly learning to select informative subsets of reviews and summarizing the opinions expressed in these subsets.
arXiv Detail & Related papers (2021-09-09T15:01:43Z) - A Training-free and Reference-free Summarization Evaluation Metric via
Centrality-weighted Relevance and Self-referenced Redundancy [60.419107377879925]
We propose a training-free and reference-free summarization evaluation metric.
Our metric consists of a centrality-weighted relevance score and a self-referenced redundancy score.
Our methods can significantly outperform existing methods on both multi-document and single-document summarization evaluation.
arXiv Detail & Related papers (2021-06-26T05:11:27Z) - Unsupervised Reference-Free Summary Quality Evaluation via Contrastive
Learning [66.30909748400023]
We propose to evaluate the summary qualities without reference summaries by unsupervised contrastive learning.
Specifically, we design a new metric which covers both linguistic qualities and semantic informativeness based on BERT.
Experiments on Newsroom and CNN/Daily Mail demonstrate that our new evaluation method outperforms other metrics even without reference summaries.
arXiv Detail & Related papers (2020-10-05T05:04:14Z) - SummEval: Re-evaluating Summarization Evaluation [169.622515287256]
We re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion.
We benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics.
We assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset.
arXiv Detail & Related papers (2020-07-24T16:25:19Z) - OpinionDigest: A Simple Framework for Opinion Summarization [22.596995566588422]
The framework uses an Aspect-based Sentiment Analysis model to extract opinion phrases from reviews, and trains a Transformer model to reconstruct the original reviews from these extractions.
The selected opinions are used as input to the trained Transformer model, which verbalizes them into an opinion summary.
OpinionDigest can also generate customized summaries, tailored to specific user needs, by filtering the selected opinions according to their aspect and/or sentiment.
arXiv Detail & Related papers (2020-05-05T01:22:29Z) - Unsupervised Opinion Summarization with Noising and Denoising [85.49169453434554]
We create a synthetic dataset from a corpus of user reviews by sampling a review, pretending it is a summary, and generating noisy versions thereof.
At test time, the model accepts genuine reviews and generates a summary containing salient opinions, treating those that do not reach consensus as noise.
arXiv Detail & Related papers (2020-04-21T16:54:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.