Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains
- URL: http://arxiv.org/abs/2402.03509v1
- Date: Mon, 5 Feb 2024 20:51:11 GMT
- Title: Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains
- Authors: Sanjana Ramprasad, Kundan Krishna, Zachary C Lipton and Byron C
Wallace
- Abstract summary: We evaluate zero-shot generated summaries across specialized domains including biomedical articles, and legal bills.
We acquire annotations from domain experts to identify inconsistencies in summaries and systematically categorize these errors.
We release all collected annotations to facilitate additional research toward measuring and realizing factually accurate summarization, beyond news articles.
- Score: 60.5207173547769
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has shown that large language models (LLMs) are capable of
generating summaries zero-shot (i.e., without explicit supervision) that, under
human assessment, are often comparable or even preferred to manually composed
reference summaries. However, this prior work has focussed almost exclusively
on evaluating news article summarization. How do zero-shot summarizers perform
in other (potentially more specialized) domains? In this work we evaluate
zero-shot generated summaries across specialized domains including biomedical
articles, and legal bills (in addition to standard news benchmarks for
reference). We focus especially on the factuality of outputs. We acquire
annotations from domain experts to identify inconsistencies in summaries and
systematically categorize these errors. We analyze whether the prevalence of a
given domain in the pretraining corpus affects extractiveness and faithfulness
of generated summaries of articles in this domain. We release all collected
annotations to facilitate additional research toward measuring and realizing
factually accurate summarization, beyond news articles. The dataset can be
downloaded from https://github.com/sanjanaramprasad/zero_shot_faceval_domains
Related papers
- Incremental Extractive Opinion Summarization Using Cover Trees [81.59625423421355]
In online marketplaces user reviews accumulate over time, and opinion summaries need to be updated periodically.
In this work, we study the task of extractive opinion summarization in an incremental setting.
We present an efficient algorithm for accurately computing the CentroidRank summaries in an incremental setting.
arXiv Detail & Related papers (2024-01-16T02:00:17Z) - AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation [57.8363998797433]
We propose AMRFact, a framework that generates perturbed summaries using Abstract Meaning Representations (AMRs)
Our approach parses factually consistent summaries into AMR graphs and injects controlled factual inconsistencies to create negative examples, allowing for coherent factually inconsistent summaries to be generated with high error-type coverage.
arXiv Detail & Related papers (2023-11-16T02:56:29Z) - On Context Utilization in Summarization with Large Language Models [83.84459732796302]
Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries.
Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens.
We conduct the first comprehensive study on context utilization and position bias in summarization.
arXiv Detail & Related papers (2023-10-16T16:45:12Z) - OpineSum: Entailment-based self-training for abstractive opinion
summarization [6.584115526134759]
We present a novel self-training approach, OpineSum, for abstractive opinion summarization.
The summaries in this approach are built using a novel application of textual entailment.
OpineSum achieves state-of-the-art performance in both settings.
arXiv Detail & Related papers (2022-12-21T06:20:28Z) - Unsupervised Opinion Summarisation in the Wasserstein Space [22.634245146129857]
We present WassOS, an unsupervised abstractive summarization model which makes use of the Wasserstein distance.
We show that WassOS almost always outperforms the state-of-the-art on ROUGE metrics and consistently produces the best summaries according to human evaluations.
arXiv Detail & Related papers (2022-11-27T19:45:38Z) - Few-Shot Learning for Opinion Summarization [117.70510762845338]
Opinion summarization is the automatic creation of text reflecting subjective information expressed in multiple documents.
In this work, we show that even a handful of summaries is sufficient to bootstrap generation of the summary text.
Our approach substantially outperforms previous extractive and abstractive methods in automatic and human evaluation.
arXiv Detail & Related papers (2020-04-30T15:37:38Z) - Unsupervised Opinion Summarization with Noising and Denoising [85.49169453434554]
We create a synthetic dataset from a corpus of user reviews by sampling a review, pretending it is a summary, and generating noisy versions thereof.
At test time, the model accepts genuine reviews and generates a summary containing salient opinions, treating those that do not reach consensus as noise.
arXiv Detail & Related papers (2020-04-21T16:54:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.