Falsesum: Generating Document-level NLI Examples for Recognizing Factual
Inconsistency in Summarization
- URL: http://arxiv.org/abs/2205.06009v1
- Date: Thu, 12 May 2022 10:43:42 GMT
- Title: Falsesum: Generating Document-level NLI Examples for Recognizing Factual
Inconsistency in Summarization
- Authors: Prasetya Ajie Utama, Joshua Bambrick, Nafise Sadat Moosavi, Iryna
Gurevych
- Abstract summary: We show that NLI models can be effective for this task when the training data is augmented with high-quality task-oriented examples.
We introduce Falsesum, a data generation pipeline leveraging a controllable text generation model to perturb human-annotated summaries.
We show that models trained on a Falsesum-augmented NLI dataset improve the state-of-the-art performance across four benchmarks for detecting factual inconsistency in summarization.
- Score: 63.21819285337555
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural abstractive summarization models are prone to generate summaries which
are factually inconsistent with their source documents. Previous work has
introduced the task of recognizing such factual inconsistency as a downstream
application of natural language inference (NLI). However, state-of-the-art NLI
models perform poorly in this context due to their inability to generalize to
the target task. In this work, we show that NLI models can be effective for
this task when the training data is augmented with high-quality task-oriented
examples. We introduce Falsesum, a data generation pipeline leveraging a
controllable text generation model to perturb human-annotated summaries,
introducing varying types of factual inconsistencies. Unlike previously
introduced document-level NLI datasets, our generated dataset contains examples
that are diverse and inconsistent yet plausible. We show that models trained on
a Falsesum-augmented NLI dataset improve the state-of-the-art performance
across four benchmarks for detecting factual inconsistency in summarization.
The code to obtain the dataset is available online at
https://github.com/joshbambrick/Falsesum
Related papers
- TrueTeacher: Learning Factual Consistency Evaluation with Large Language
Models [20.09470051458651]
We introduce TrueTeacher, a method for generating synthetic data by annotating diverse model-generated summaries.
Unlike prior work, TrueTeacher does not rely on human-written summaries, and is multilingual by nature.
arXiv Detail & Related papers (2023-05-18T17:58:35Z) - mFACE: Multilingual Summarization with Factual Consistency Evaluation [79.60172087719356]
Abstractive summarization has enjoyed renewed interest in recent years, thanks to pre-trained language models and the availability of large-scale datasets.
Despite promising results, current models still suffer from generating factually inconsistent summaries.
We leverage factual consistency evaluation models to improve multilingual summarization.
arXiv Detail & Related papers (2022-12-20T19:52:41Z) - WeCheck: Strong Factual Consistency Checker via Weakly Supervised
Learning [40.5830891229718]
We propose a weakly supervised framework that aggregates multiple resources to train a precise and efficient factual metric, namely WeCheck.
Comprehensive experiments on a variety of tasks demonstrate the strong performance of WeCheck, which achieves a 3.4% absolute improvement over previous state-of-the-art methods on TRUE benchmark on average.
arXiv Detail & Related papers (2022-12-20T08:04:36Z) - Evaluating the Factual Consistency of Large Language Models Through News
Summarization [97.04685401448499]
We propose a new benchmark called FIB(Factual Inconsistency Benchmark) that focuses on the task of summarization.
For factually consistent summaries, we use human-written reference summaries that we manually verify as factually consistent.
For factually inconsistent summaries, we generate summaries from a suite of summarization models that we have manually annotated as factually inconsistent.
arXiv Detail & Related papers (2022-11-15T18:50:34Z) - Correcting Diverse Factual Errors in Abstractive Summarization via
Post-Editing and Language Model Infilling [56.70682379371534]
We show that our approach vastly outperforms prior methods in correcting erroneous summaries.
Our model -- FactEdit -- improves factuality scores by over 11 points on CNN/DM and over 31 points on XSum.
arXiv Detail & Related papers (2022-10-22T07:16:19Z) - Masked Summarization to Generate Factually Inconsistent Summaries for
Improved Factual Consistency Checking [28.66287193703365]
We propose to generate factually inconsistent summaries using source texts and reference summaries with key information masked.
Experiments on seven benchmark datasets demonstrate that factual consistency classifiers trained on summaries generated using our method generally outperform existing models.
arXiv Detail & Related papers (2022-05-04T12:48:49Z) - SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in
Summarization [27.515873862013006]
Key requirement for summaries is to be factually consistent with the input document.
Previous work has found that natural language inference models do not perform competitively when applied to inconsistency detection.
We provide a highly effective and light-weight method called SummaCConv that enables NLI models to be successfully used for this task.
arXiv Detail & Related papers (2021-11-18T05:02:31Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z) - Adversarial NLI for Factual Correctness in Text Summarisation Models [0.0]
We apply the Adversarial NLI dataset to train the NLI model.
We show that the model has the potential to enhance factual correctness in abstract summarization.
arXiv Detail & Related papers (2020-05-24T13:02:57Z) - Pre-training for Abstractive Document Summarization by Reinstating
Source Text [105.77348528847337]
This paper presents three pre-training objectives which allow us to pre-train a Seq2Seq based abstractive summarization model on unlabeled text.
Experiments on two benchmark summarization datasets show that all three objectives can improve performance upon baselines.
arXiv Detail & Related papers (2020-04-04T05:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.