Questioning the Validity of Summarization Datasets and Improving Their
Factual Consistency
- URL: http://arxiv.org/abs/2210.17378v1
- Date: Mon, 31 Oct 2022 15:04:20 GMT
- Title: Questioning the Validity of Summarization Datasets and Improving Their
Factual Consistency
- Authors: Yanzhu Guo, Chlo\'e Clavel, Moussa Kamal Eddine and Michalis
Vazirgiannis
- Abstract summary: We release SummFC, a filtered summarization dataset with improved factual consistency.
We argue that our dataset should become a valid benchmark for developing and evaluating summarization systems.
- Score: 14.974996886744083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The topic of summarization evaluation has recently attracted a surge of
attention due to the rapid development of abstractive summarization systems.
However, the formulation of the task is rather ambiguous, neither the
linguistic nor the natural language processing community has succeeded in
giving a mutually agreed-upon definition. Due to this lack of well-defined
formulation, a large number of popular abstractive summarization datasets are
constructed in a manner that neither guarantees validity nor meets one of the
most essential criteria of summarization: factual consistency. In this paper,
we address this issue by combining state-of-the-art factual consistency models
to identify the problematic instances present in popular summarization
datasets. We release SummFC, a filtered summarization dataset with improved
factual consistency, and demonstrate that models trained on this dataset
achieve improved performance in nearly all quality aspects. We argue that our
dataset should become a valid benchmark for developing and evaluating
summarization systems.
Related papers
- SUMIE: A Synthetic Benchmark for Incremental Entity Summarization [6.149024468471498]
No existing dataset adequately tests how well language models can incrementally update entity summaries.
We introduce SUMIE, a fully synthetic dataset designed to expose real-world IES challenges.
This dataset effectively highlights problems like incorrect entity association and incomplete information presentation.
arXiv Detail & Related papers (2024-06-07T16:49:21Z) - AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation [57.8363998797433]
We propose AMRFact, a framework that generates perturbed summaries using Abstract Meaning Representations (AMRs)
Our approach parses factually consistent summaries into AMR graphs and injects controlled factual inconsistencies to create negative examples, allowing for coherent factually inconsistent summaries to be generated with high error-type coverage.
arXiv Detail & Related papers (2023-11-16T02:56:29Z) - Correcting Diverse Factual Errors in Abstractive Summarization via
Post-Editing and Language Model Infilling [56.70682379371534]
We show that our approach vastly outperforms prior methods in correcting erroneous summaries.
Our model -- FactEdit -- improves factuality scores by over 11 points on CNN/DM and over 31 points on XSum.
arXiv Detail & Related papers (2022-10-22T07:16:19Z) - Masked Summarization to Generate Factually Inconsistent Summaries for
Improved Factual Consistency Checking [28.66287193703365]
We propose to generate factually inconsistent summaries using source texts and reference summaries with key information masked.
Experiments on seven benchmark datasets demonstrate that factual consistency classifiers trained on summaries generated using our method generally outperform existing models.
arXiv Detail & Related papers (2022-05-04T12:48:49Z) - Investigating Crowdsourcing Protocols for Evaluating the Factual
Consistency of Summaries [59.27273928454995]
Current pre-trained models applied to summarization are prone to factual inconsistencies which misrepresent the source text or introduce extraneous information.
We create a crowdsourcing evaluation framework for factual consistency using the rating-based Likert scale and ranking-based Best-Worst Scaling protocols.
We find that ranking-based protocols offer a more reliable measure of summary quality across datasets, while the reliability of Likert ratings depends on the target dataset and the evaluation design.
arXiv Detail & Related papers (2021-09-19T19:05:00Z) - Entity-level Factual Consistency of Abstractive Text Summarization [26.19686599842915]
Key challenge for abstractive summarization is ensuring factual consistency of the generated summary with respect to the original document.
We propose a set of new metrics to quantify the entity-level factual consistency of generated summaries.
arXiv Detail & Related papers (2021-02-18T03:07:28Z) - Unsupervised Opinion Summarization with Content Planning [58.5308638148329]
We show that explicitly incorporating content planning in a summarization model yields output of higher quality.
We also create synthetic datasets which are more natural, resembling real world document-summary pairs.
Our approach outperforms competitive models in generating informative, coherent, and fluent summaries.
arXiv Detail & Related papers (2020-12-14T18:41:58Z) - Multi-Fact Correction in Abstractive Text Summarization [98.27031108197944]
Span-Fact is a suite of two factual correction models that leverages knowledge learned from question answering models to make corrections in system-generated summaries via span selection.
Our models employ single or multi-masking strategies to either iteratively or auto-regressively replace entities in order to ensure semantic consistency w.r.t. the source text.
Experiments show that our models significantly boost the factual consistency of system-generated summaries without sacrificing summary quality in terms of both automatic metrics and human evaluation.
arXiv Detail & Related papers (2020-10-06T02:51:02Z) - Enhancing Factual Consistency of Abstractive Summarization [57.67609672082137]
We propose a fact-aware summarization model FASum to extract and integrate factual relations into the summary generation process.
We then design a factual corrector model FC to automatically correct factual errors from summaries generated by existing systems.
arXiv Detail & Related papers (2020-03-19T07:36:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.