Masked Summarization to Generate Factually Inconsistent Summaries for
Improved Factual Consistency Checking
- URL: http://arxiv.org/abs/2205.02035v1
- Date: Wed, 4 May 2022 12:48:49 GMT
- Title: Masked Summarization to Generate Factually Inconsistent Summaries for
Improved Factual Consistency Checking
- Authors: Hwanhee Lee, Kang Min Yoo, Joonsuk Park, Hwaran Lee, Kyomin Jung
- Abstract summary: We propose to generate factually inconsistent summaries using source texts and reference summaries with key information masked.
Experiments on seven benchmark datasets demonstrate that factual consistency classifiers trained on summaries generated using our method generally outperform existing models.
- Score: 28.66287193703365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the recent advances in abstractive summarization systems, it is still
difficult to determine whether a generated summary is factual consistent with
the source text. To this end, the latest approach is to train a factual
consistency classifier on factually consistent and inconsistent summaries.
Luckily, the former is readily available as reference summaries in existing
summarization datasets. However, generating the latter remains a challenge, as
they need to be factually inconsistent, yet closely relevant to the source text
to be effective. In this paper, we propose to generate factually inconsistent
summaries using source texts and reference summaries with key information
masked. Experiments on seven benchmark datasets demonstrate that factual
consistency classifiers trained on summaries generated using our method
generally outperform existing models and show a competitive correlation with
human judgments. We also analyze the characteristics of the summaries generated
using our method. We will release the pre-trained model and the code at
https://github.com/hwanheelee1993/MFMA.
Related papers
- AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation [57.8363998797433]
We propose AMRFact, a framework that generates perturbed summaries using Abstract Meaning Representations (AMRs)
Our approach parses factually consistent summaries into AMR graphs and injects controlled factual inconsistencies to create negative examples, allowing for coherent factually inconsistent summaries to be generated with high error-type coverage.
arXiv Detail & Related papers (2023-11-16T02:56:29Z) - Evaluating the Factual Consistency of Large Language Models Through News
Summarization [97.04685401448499]
We propose a new benchmark called FIB(Factual Inconsistency Benchmark) that focuses on the task of summarization.
For factually consistent summaries, we use human-written reference summaries that we manually verify as factually consistent.
For factually inconsistent summaries, we generate summaries from a suite of summarization models that we have manually annotated as factually inconsistent.
arXiv Detail & Related papers (2022-11-15T18:50:34Z) - Questioning the Validity of Summarization Datasets and Improving Their
Factual Consistency [14.974996886744083]
We release SummFC, a filtered summarization dataset with improved factual consistency.
We argue that our dataset should become a valid benchmark for developing and evaluating summarization systems.
arXiv Detail & Related papers (2022-10-31T15:04:20Z) - Correcting Diverse Factual Errors in Abstractive Summarization via
Post-Editing and Language Model Infilling [56.70682379371534]
We show that our approach vastly outperforms prior methods in correcting erroneous summaries.
Our model -- FactEdit -- improves factuality scores by over 11 points on CNN/DM and over 31 points on XSum.
arXiv Detail & Related papers (2022-10-22T07:16:19Z) - Falsesum: Generating Document-level NLI Examples for Recognizing Factual
Inconsistency in Summarization [63.21819285337555]
We show that NLI models can be effective for this task when the training data is augmented with high-quality task-oriented examples.
We introduce Falsesum, a data generation pipeline leveraging a controllable text generation model to perturb human-annotated summaries.
We show that models trained on a Falsesum-augmented NLI dataset improve the state-of-the-art performance across four benchmarks for detecting factual inconsistency in summarization.
arXiv Detail & Related papers (2022-05-12T10:43:42Z) - Multi-Fact Correction in Abstractive Text Summarization [98.27031108197944]
Span-Fact is a suite of two factual correction models that leverages knowledge learned from question answering models to make corrections in system-generated summaries via span selection.
Our models employ single or multi-masking strategies to either iteratively or auto-regressively replace entities in order to ensure semantic consistency w.r.t. the source text.
Experiments show that our models significantly boost the factual consistency of system-generated summaries without sacrificing summary quality in terms of both automatic metrics and human evaluation.
arXiv Detail & Related papers (2020-10-06T02:51:02Z) - Enhancing Factual Consistency of Abstractive Summarization [57.67609672082137]
We propose a fact-aware summarization model FASum to extract and integrate factual relations into the summary generation process.
We then design a factual corrector model FC to automatically correct factual errors from summaries generated by existing systems.
arXiv Detail & Related papers (2020-03-19T07:36:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.