Podcast Summary Assessment: A Resource for Evaluating Summary Assessment
Methods
- URL: http://arxiv.org/abs/2208.13265v1
- Date: Sun, 28 Aug 2022 18:24:41 GMT
- Title: Podcast Summary Assessment: A Resource for Evaluating Summary Assessment
Methods
- Authors: Potsawee Manakul, Mark J. F. Gales
- Abstract summary: We describe a new dataset, the podcast summary assessment corpus.
This dataset has two unique aspects: (i) long-input, speech podcast based, documents; and (ii) an opportunity to detect inappropriate reference summaries in podcast corpus.
- Score: 42.08097583183816
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic summary assessment is useful for both machine-generated and
human-produced summaries. Automatically evaluating the summary text given the
document enables, for example, summary generation system development and
detection of inappropriate summaries. Summary assessment can be run in a number
of modes: ranking summary generation systems; ranking summaries of a particular
document; and estimating the quality of a document-summary pair on an absolute
scale. Existing datasets with annotation for summary assessment are usually
based on news summarization datasets such as CNN/DailyMail or XSum. In this
work, we describe a new dataset, the podcast summary assessment corpus, a
collection of podcast summaries that were evaluated by human experts at
TREC2020. Compared to existing summary assessment data, this dataset has two
unique aspects: (i) long-input, speech podcast based, documents; and (ii) an
opportunity to detect inappropriate reference summaries in podcast corpus.
First, we examine existing assessment methods, including model-free and
model-based methods, and provide benchmark results for this long-input summary
assessment dataset. Second, with the aim of filtering reference
summary-document pairings for training, we apply summary assessment for data
selection. The experimental results on these two aspects provide interesting
insights on the summary assessment and generation tasks. The podcast summary
assessment data is available.
Related papers
- Is Summary Useful or Not? An Extrinsic Human Evaluation of Text
Summaries on Downstream Tasks [45.550554287918885]
This paper focuses on evaluating the usefulness of text summaries with extrinsic methods.
We design three different downstream tasks for extrinsic human evaluation of summaries, i.e., question answering, text classification and text similarity assessment.
We find summaries are particularly useful in tasks that rely on an overall judgment of the text, while being less effective for question answering tasks.
arXiv Detail & Related papers (2023-05-24T11:34:39Z) - Towards Personalized Review Summarization by Modeling Historical Reviews
from Customer and Product Separately [59.61932899841944]
Review summarization is a non-trivial task that aims to summarize the main idea of the product review in the E-commerce website.
We propose the Heterogeneous Historical Review aware Review Summarization Model (HHRRS)
We employ a multi-task framework that conducts the review sentiment classification and summarization jointly.
arXiv Detail & Related papers (2023-01-27T12:32:55Z) - RISE: Leveraging Retrieval Techniques for Summarization Evaluation [3.9215337270154995]
We present RISE, a new approach for evaluating summaries by leveraging techniques from information retrieval.
RISE is first trained as a retrieval task using a dual-encoder retrieval setup, and can then be subsequently utilized for evaluating a generated summary given an input document, without gold reference summaries.
We conduct comprehensive experiments on the SummEval benchmark (Fabbri et al., 2021) and the results show that RISE has higher correlation with human evaluations compared to many past approaches to summarization evaluation.
arXiv Detail & Related papers (2022-12-17T01:09:22Z) - Re-evaluating Evaluation in Text Summarization [77.4601291738445]
We re-evaluate the evaluation method for text summarization using top-scoring system outputs.
We find that conclusions about evaluation metrics on older datasets do not necessarily hold on modern datasets and systems.
arXiv Detail & Related papers (2020-10-14T13:58:53Z) - Unsupervised Reference-Free Summary Quality Evaluation via Contrastive
Learning [66.30909748400023]
We propose to evaluate the summary qualities without reference summaries by unsupervised contrastive learning.
Specifically, we design a new metric which covers both linguistic qualities and semantic informativeness based on BERT.
Experiments on Newsroom and CNN/Daily Mail demonstrate that our new evaluation method outperforms other metrics even without reference summaries.
arXiv Detail & Related papers (2020-10-05T05:04:14Z) - SummEval: Re-evaluating Summarization Evaluation [169.622515287256]
We re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion.
We benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics.
We assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset.
arXiv Detail & Related papers (2020-07-24T16:25:19Z) - SueNes: A Weakly Supervised Approach to Evaluating Single-Document
Summarization via Negative Sampling [25.299937353444854]
We present a proof-of-concept study to a weakly supervised summary evaluation approach without the presence of reference summaries.
Massive data in existing summarization datasets are transformed for training by pairing documents with corrupted reference summaries.
arXiv Detail & Related papers (2020-05-13T15:40:13Z) - Unsupervised Opinion Summarization with Noising and Denoising [85.49169453434554]
We create a synthetic dataset from a corpus of user reviews by sampling a review, pretending it is a summary, and generating noisy versions thereof.
At test time, the model accepts genuine reviews and generates a summary containing salient opinions, treating those that do not reach consensus as noise.
arXiv Detail & Related papers (2020-04-21T16:54:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.