Zero-shot Faithful Factual Error Correction
- URL: http://arxiv.org/abs/2305.07982v2
- Date: Sat, 27 May 2023 15:38:29 GMT
- Title: Zero-shot Faithful Factual Error Correction
- Authors: Kung-Hsiang Huang, Hou Pong Chan, Heng Ji
- Abstract summary: Faithfully correcting factual errors is critical for maintaining the integrity of textual knowledge bases and preventing hallucinations in sequence-to-sequence models.
We present a zero-shot framework that formulates questions about input claims, looks for correct answers in the given evidence, and assesses the faithfulness of each correction based on its consistency with the evidence.
- Score: 53.121642212060536
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Faithfully correcting factual errors is critical for maintaining the
integrity of textual knowledge bases and preventing hallucinations in
sequence-to-sequence models. Drawing on humans' ability to identify and correct
factual errors, we present a zero-shot framework that formulates questions
about input claims, looks for correct answers in the given evidence, and
assesses the faithfulness of each correction based on its consistency with the
evidence. Our zero-shot framework outperforms fully-supervised approaches, as
demonstrated by experiments on the FEVER and SciFact datasets, where our
outputs are shown to be more faithful. More importantly, the decomposability
nature of our framework inherently provides interpretability. Additionally, to
reveal the most suitable metrics for evaluating factual error corrections, we
analyze the correlation between commonly used metrics with human judgments in
terms of three different dimensions regarding intelligibility and faithfulness.
Related papers
- Binary Classification with Confidence Difference [100.08818204756093]
This paper delves into a novel weakly supervised binary classification problem called confidence-difference (ConfDiff) classification.
We propose a risk-consistent approach to tackle this problem and show that the estimation error bound the optimal convergence rate.
We also introduce a risk correction approach to mitigate overfitting problems, whose consistency and convergence rate are also proven.
arXiv Detail & Related papers (2023-10-09T11:44:50Z) - Interpretable Automatic Fine-grained Inconsistency Detection in Text
Summarization [56.94741578760294]
We propose the task of fine-grained inconsistency detection, the goal of which is to predict the fine-grained types of factual errors in a summary.
Motivated by how humans inspect factual inconsistency in summaries, we propose an interpretable fine-grained inconsistency detection model, FineGrainFact.
arXiv Detail & Related papers (2023-05-23T22:11:47Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Understanding Factual Errors in Summarization: Errors, Summarizers,
Datasets, Error Detectors [105.12462629663757]
In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model.
We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models.
arXiv Detail & Related papers (2022-05-25T15:26:48Z) - Factual Consistency Evaluation for Text Summarization via Counterfactual
Estimation [42.63902468258758]
We propose a novel metric to evaluate the factual consistency in text summarization via counterfactual estimation.
We conduct a series of experiments on three public abstractive text summarization datasets.
arXiv Detail & Related papers (2021-08-30T11:48:41Z) - Annotating and Modeling Fine-grained Factuality in Summarization [36.88018450067003]
A major barrier to their use in practice is their propensity to output summaries that are not faithful to the input and that contain factual errors.
We explore both synthetic and human-labeled data sources for training models to identify factual errors in summarization.
We show that our best factuality detection model enables training of more factual XSum summarization models by allowing us to identify non-factual tokens in the training data.
arXiv Detail & Related papers (2021-04-09T11:20:44Z) - Don't Just Blame Over-parametrization for Over-confidence: Theoretical
Analysis of Calibration in Binary Classification [58.03725169462616]
We show theoretically that over-parametrization is not the only reason for over-confidence.
We prove that logistic regression is inherently over-confident, in the realizable, under-parametrized setting.
Perhaps surprisingly, we also show that over-confidence is not always the case.
arXiv Detail & Related papers (2021-02-15T21:38:09Z) - Reliable Post hoc Explanations: Modeling Uncertainty in Explainability [44.9824285459365]
Black box explanations are increasingly being employed to establish model credibility in high-stakes settings.
prior work demonstrates that explanations generated by state-of-the-art techniques are inconsistent, unstable, and provide very little insight into their correctness and reliability.
We develop a novel Bayesian framework for generating local explanations along with their associated uncertainty.
arXiv Detail & Related papers (2020-08-11T22:52:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.