SueNes: A Weakly Supervised Approach to Evaluating Single-Document
Summarization via Negative Sampling
- URL: http://arxiv.org/abs/2005.06377v3
- Date: Thu, 5 May 2022 04:00:10 GMT
- Title: SueNes: A Weakly Supervised Approach to Evaluating Single-Document
Summarization via Negative Sampling
- Authors: Forrest Sheng Bao, Hebi Li, Ge Luo, Minghui Qiu, Yinfei Yang, Youbiao
He, Cen Chen
- Abstract summary: We present a proof-of-concept study to a weakly supervised summary evaluation approach without the presence of reference summaries.
Massive data in existing summarization datasets are transformed for training by pairing documents with corrupted reference summaries.
- Score: 25.299937353444854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Canonical automatic summary evaluation metrics, such as ROUGE, focus on
lexical similarity which cannot well capture semantics nor linguistic quality
and require a reference summary which is costly to obtain. Recently, there have
been a growing number of efforts to alleviate either or both of the two
drawbacks. In this paper, we present a proof-of-concept study to a weakly
supervised summary evaluation approach without the presence of reference
summaries. Massive data in existing summarization datasets are transformed for
training by pairing documents with corrupted reference summaries. In
cross-domain tests, our strategy outperforms baselines with promising
improvements, and show a great advantage in gauging linguistic qualities over
all metrics.
Related papers
- AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation [57.8363998797433]
We propose AMRFact, a framework that generates perturbed summaries using Abstract Meaning Representations (AMRs)
Our approach parses factually consistent summaries into AMR graphs and injects controlled factual inconsistencies to create negative examples, allowing for coherent factually inconsistent summaries to be generated with high error-type coverage.
arXiv Detail & Related papers (2023-11-16T02:56:29Z) - Lexical Repetitions Lead to Rote Learning: Unveiling the Impact of
Lexical Overlap in Train and Test Reference Summaries [131.80860903537172]
Ideal summarization models should generalize to novel summary-worthy content without remembering reference training summaries by rote.
We propose a fine-grained evaluation protocol by partitioning a test set based on the lexical similarity of reference test summaries with training summaries.
arXiv Detail & Related papers (2023-11-15T23:47:53Z) - Towards Interpretable Summary Evaluation via Allocation of Contextual
Embeddings to Reference Text Topics [1.5749416770494706]
The multifaceted interpretable summary evaluation method (MISEM) is based on allocation of a summary's contextual token embeddings to semantic topics identified in the reference text.
MISEM achieves a promising.404 Pearson correlation with human judgment on the TAC'08 dataset.
arXiv Detail & Related papers (2022-10-25T17:09:08Z) - SNaC: Coherence Error Detection for Narrative Summarization [73.48220043216087]
We introduce SNaC, a narrative coherence evaluation framework rooted in fine-grained annotations for long summaries.
We develop a taxonomy of coherence errors in generated narrative summaries and collect span-level annotations for 6.6k sentences across 150 book and movie screenplay summaries.
Our work provides the first characterization of coherence errors generated by state-of-the-art summarization models and a protocol for eliciting coherence judgments from crowd annotators.
arXiv Detail & Related papers (2022-05-19T16:01:47Z) - A Training-free and Reference-free Summarization Evaluation Metric via
Centrality-weighted Relevance and Self-referenced Redundancy [60.419107377879925]
We propose a training-free and reference-free summarization evaluation metric.
Our metric consists of a centrality-weighted relevance score and a self-referenced redundancy score.
Our methods can significantly outperform existing methods on both multi-document and single-document summarization evaluation.
arXiv Detail & Related papers (2021-06-26T05:11:27Z) - Contextualized Rewriting for Text Summarization [10.666547385992935]
We formalized rewriting as a seq2seq problem with group alignments.
Results show that our approach significantly outperforms non-contextualized rewriting systems.
arXiv Detail & Related papers (2021-01-31T05:35:57Z) - Unsupervised Reference-Free Summary Quality Evaluation via Contrastive
Learning [66.30909748400023]
We propose to evaluate the summary qualities without reference summaries by unsupervised contrastive learning.
Specifically, we design a new metric which covers both linguistic qualities and semantic informativeness based on BERT.
Experiments on Newsroom and CNN/Daily Mail demonstrate that our new evaluation method outperforms other metrics even without reference summaries.
arXiv Detail & Related papers (2020-10-05T05:04:14Z) - SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for
Multi-Document Summarization [31.082618343998533]
We propose SUPERT, which rates the quality of a summary by measuring its semantic similarity with a pseudo reference summary.
Compared to the state-of-the-art unsupervised evaluation metrics, SUPERT correlates better with human ratings by 18-39%.
We use SUPERT as rewards to guide a neural-based reinforcement learning summarizer, yielding favorable performance compared to the state-of-the-art unsupervised summarizers.
arXiv Detail & Related papers (2020-05-07T19:54:24Z) - Unsupervised Opinion Summarization with Noising and Denoising [85.49169453434554]
We create a synthetic dataset from a corpus of user reviews by sampling a review, pretending it is a summary, and generating noisy versions thereof.
At test time, the model accepts genuine reviews and generates a summary containing salient opinions, treating those that do not reach consensus as noise.
arXiv Detail & Related papers (2020-04-21T16:54:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.