SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in
Summarization
- URL: http://arxiv.org/abs/2111.09525v1
- Date: Thu, 18 Nov 2021 05:02:31 GMT
- Title: SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in
Summarization
- Authors: Philippe Laban and Tobias Schnabel and Paul N. Bennett and Marti A.
Hearst
- Abstract summary: Key requirement for summaries is to be factually consistent with the input document.
Previous work has found that natural language inference models do not perform competitively when applied to inconsistency detection.
We provide a highly effective and light-weight method called SummaCConv that enables NLI models to be successfully used for this task.
- Score: 27.515873862013006
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the summarization domain, a key requirement for summaries is to be
factually consistent with the input document. Previous work has found that
natural language inference (NLI) models do not perform competitively when
applied to inconsistency detection. In this work, we revisit the use of NLI for
inconsistency detection, finding that past work suffered from a mismatch in
input granularity between NLI datasets (sentence-level), and inconsistency
detection (document level). We provide a highly effective and light-weight
method called SummaCConv that enables NLI models to be successfully used for
this task by segmenting documents into sentence units and aggregating scores
between pairs of sentences. On our newly introduced benchmark called SummaC
(Summary Consistency) consisting of six large inconsistency detection datasets,
SummaCConv obtains state-of-the-art results with a balanced accuracy of 74.4%,
a 5% point improvement compared to prior work. We make the models and datasets
available: https://github.com/tingofurro/summac
Related papers
- Using Similarity to Evaluate Factual Consistency in Summaries [2.7595794227140056]
Abstractive summarisers generate fluent summaries, but the factuality of the generated text is not guaranteed.
We propose a new zero-shot factuality evaluation metric, Sentence-BERTScore (SBERTScore), which compares sentences between the summary and the source document.
Our experiments indicate that each technique has different strengths, with SBERTScore particularly effective in identifying correct summaries.
arXiv Detail & Related papers (2024-09-23T15:02:38Z) - Noisy Pair Corrector for Dense Retrieval [59.312376423104055]
We propose a novel approach called Noisy Pair Corrector (NPC)
NPC consists of a detection module and a correction module.
We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS.
arXiv Detail & Related papers (2023-11-07T08:27:14Z) - LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond [135.8013388183257]
We propose a new protocol for inconsistency detection benchmark creation and implement it in a 10-domain benchmark called SummEdits.
Most LLMs struggle on SummEdits, with performance close to random chance.
The best-performing model, GPT-4, is still 8% below estimated human performance.
arXiv Detail & Related papers (2023-05-23T21:50:06Z) - Revisiting text decomposition methods for NLI-based factuality scoring
of summaries [9.044665059626958]
We show that fine-grained decomposition is not always a winning strategy for factuality scoring.
We also show that small changes to previously proposed entailment-based scoring methods can result in better performance.
arXiv Detail & Related papers (2022-11-30T09:54:37Z) - Evaluating the Factual Consistency of Large Language Models Through News
Summarization [97.04685401448499]
We propose a new benchmark called FIB(Factual Inconsistency Benchmark) that focuses on the task of summarization.
For factually consistent summaries, we use human-written reference summaries that we manually verify as factually consistent.
For factually inconsistent summaries, we generate summaries from a suite of summarization models that we have manually annotated as factually inconsistent.
arXiv Detail & Related papers (2022-11-15T18:50:34Z) - Falsesum: Generating Document-level NLI Examples for Recognizing Factual
Inconsistency in Summarization [63.21819285337555]
We show that NLI models can be effective for this task when the training data is augmented with high-quality task-oriented examples.
We introduce Falsesum, a data generation pipeline leveraging a controllable text generation model to perturb human-annotated summaries.
We show that models trained on a Falsesum-augmented NLI dataset improve the state-of-the-art performance across four benchmarks for detecting factual inconsistency in summarization.
arXiv Detail & Related papers (2022-05-12T10:43:42Z) - Masked Summarization to Generate Factually Inconsistent Summaries for
Improved Factual Consistency Checking [28.66287193703365]
We propose to generate factually inconsistent summaries using source texts and reference summaries with key information masked.
Experiments on seven benchmark datasets demonstrate that factual consistency classifiers trained on summaries generated using our method generally outperform existing models.
arXiv Detail & Related papers (2022-05-04T12:48:49Z) - Stretching Sentence-pair NLI Models to Reason over Long Documents and
Clusters [35.103851212995046]
Natural Language Inference (NLI) has been extensively studied by the NLP community as a framework for estimating the semantic relation between sentence pairs.
We explore the direct zero-shot applicability of NLI models to real applications, beyond the sentence-pair setting they were trained on.
We develop new aggregation methods to allow operating over full documents, reaching state-of-the-art performance on the ContractNLI dataset.
arXiv Detail & Related papers (2022-04-15T12:56:39Z) - MatchVIE: Exploiting Match Relevancy between Entities for Visual
Information Extraction [48.55908127994688]
We propose a novel key-value matching model based on a graph neural network for VIE (MatchVIE)
Through key-value matching based on relevancy evaluation, the proposed MatchVIE can bypass the recognitions to various semantics.
We introduce a simple but effective operation, Num2Vec, to tackle the instability of encoded values.
arXiv Detail & Related papers (2021-06-24T12:06:29Z) - DocNLI: A Large-scale Dataset for Document-level Natural Language
Inference [55.868482696821815]
Natural language inference (NLI) is formulated as a unified framework for solving various NLP problems.
This work presents DocNLI -- a newly-constructed large-scale dataset for document-level NLI.
arXiv Detail & Related papers (2021-06-17T13:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.