Unsupervised Reference-Free Summary Quality Evaluation via Contrastive
Learning
- URL: http://arxiv.org/abs/2010.01781v1
- Date: Mon, 5 Oct 2020 05:04:14 GMT
- Title: Unsupervised Reference-Free Summary Quality Evaluation via Contrastive
Learning
- Authors: Hanlu Wu, Tengfei Ma, Lingfei Wu, Tariro Manyumwa and Shouling Ji
- Abstract summary: We propose to evaluate the summary qualities without reference summaries by unsupervised contrastive learning.
Specifically, we design a new metric which covers both linguistic qualities and semantic informativeness based on BERT.
Experiments on Newsroom and CNN/Daily Mail demonstrate that our new evaluation method outperforms other metrics even without reference summaries.
- Score: 66.30909748400023
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Evaluation of a document summarization system has been a critical factor to
impact the success of the summarization task. Previous approaches, such as
ROUGE, mainly consider the informativeness of the assessed summary and require
human-generated references for each test summary. In this work, we propose to
evaluate the summary qualities without reference summaries by unsupervised
contrastive learning. Specifically, we design a new metric which covers both
linguistic qualities and semantic informativeness based on BERT. To learn the
metric, for each summary, we construct different types of negative samples with
respect to different aspects of the summary qualities, and train our model with
a ranking loss. Experiments on Newsroom and CNN/Daily Mail demonstrate that our
new evaluation method outperforms other metrics even without reference
summaries. Furthermore, we show that our method is general and transferable
across datasets.
Related papers
- Improving Factuality of Abstractive Summarization via Contrastive Reward
Learning [77.07192378869776]
We propose a simple but effective contrastive learning framework that incorporates recent developments in reward learning and factuality metrics.
Empirical studies demonstrate that the proposed framework enables summarization models to learn from feedback of factuality metrics.
arXiv Detail & Related papers (2023-07-10T12:01:18Z) - Is Summary Useful or Not? An Extrinsic Human Evaluation of Text
Summaries on Downstream Tasks [45.550554287918885]
This paper focuses on evaluating the usefulness of text summaries with extrinsic methods.
We design three different downstream tasks for extrinsic human evaluation of summaries, i.e., question answering, text classification and text similarity assessment.
We find summaries are particularly useful in tasks that rely on an overall judgment of the text, while being less effective for question answering tasks.
arXiv Detail & Related papers (2023-05-24T11:34:39Z) - A Training-free and Reference-free Summarization Evaluation Metric via
Centrality-weighted Relevance and Self-referenced Redundancy [60.419107377879925]
We propose a training-free and reference-free summarization evaluation metric.
Our metric consists of a centrality-weighted relevance score and a self-referenced redundancy score.
Our methods can significantly outperform existing methods on both multi-document and single-document summarization evaluation.
arXiv Detail & Related papers (2021-06-26T05:11:27Z) - Estimation of Summary-to-Text Inconsistency by Mismatched Embeddings [0.0]
We propose a new reference-free summary quality evaluation measure, with emphasis on the faithfulness.
The proposed ESTIME, Estimator of Summary-to-Text Inconsistency by Mismatched Embeddings, correlates with expert scores in summary-level SummEval dataset stronger than other common evaluation measures.
arXiv Detail & Related papers (2021-04-12T01:58:21Z) - How to Evaluate a Summarizer: Study Design and Statistical Analysis for
Manual Linguistic Quality Evaluation [3.624563211765782]
We show that best choice of evaluation method can vary from one aspect to another.
We show that the total number of annotators can have a strong impact on study power.
Current statistical analysis methods can inflate type I error rates up to eight-fold.
arXiv Detail & Related papers (2021-01-27T10:14:15Z) - Understanding the Extent to which Summarization Evaluation Metrics
Measure the Information Quality of Summaries [74.28810048824519]
We analyze the token alignments used by ROUGE and BERTScore to compare summaries.
We argue that their scores largely cannot be interpreted as measuring information overlap.
arXiv Detail & Related papers (2020-10-23T15:55:15Z) - SummEval: Re-evaluating Summarization Evaluation [169.622515287256]
We re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion.
We benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics.
We assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset.
arXiv Detail & Related papers (2020-07-24T16:25:19Z) - SueNes: A Weakly Supervised Approach to Evaluating Single-Document
Summarization via Negative Sampling [25.299937353444854]
We present a proof-of-concept study to a weakly supervised summary evaluation approach without the presence of reference summaries.
Massive data in existing summarization datasets are transformed for training by pairing documents with corrupted reference summaries.
arXiv Detail & Related papers (2020-05-13T15:40:13Z) - Unsupervised Opinion Summarization with Noising and Denoising [85.49169453434554]
We create a synthetic dataset from a corpus of user reviews by sampling a review, pretending it is a summary, and generating noisy versions thereof.
At test time, the model accepts genuine reviews and generates a summary containing salient opinions, treating those that do not reach consensus as noise.
arXiv Detail & Related papers (2020-04-21T16:54:57Z) - Learning by Semantic Similarity Makes Abstractive Summarization Better [13.324006587838522]
We compare the generated summaries from recent LM, BART, and the reference summaries from a benchmark dataset, CNN/DM.
Interestingly, model-generated summaries receive higher scores relative to reference summaries.
arXiv Detail & Related papers (2020-02-18T17:59:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.