Evaluating Factuality in Generation with Dependency-level Entailment
- URL: http://arxiv.org/abs/2010.05478v2
- Date: Thu, 22 Oct 2020 06:35:58 GMT
- Title: Evaluating Factuality in Generation with Dependency-level Entailment
- Authors: Tanya Goyal, Greg Durrett
- Abstract summary: We propose a new formulation of entailment that decomposes it at the level of dependency arcs.
We show that our dependency arc entailment model trained on this data can identify factual inconsistencies in paraphrasing and summarization better than sentence-level methods.
- Score: 57.5316011554622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite significant progress in text generation models, a serious limitation
is their tendency to produce text that is factually inconsistent with
information in the input. Recent work has studied whether textual entailment
systems can be used to identify factual errors; however, these sentence-level
entailment models are trained to solve a different problem than generation
filtering and they do not localize which part of a generation is non-factual.
In this paper, we propose a new formulation of entailment that decomposes it at
the level of dependency arcs. Rather than focusing on aggregate decisions, we
instead ask whether the semantic relationship manifested by individual
dependency arcs in the generated output is supported by the input. Human
judgments on this task are difficult to obtain; we therefore propose a method
to automatically create data based on existing entailment or paraphrase
corpora. Experiments show that our dependency arc entailment model trained on
this data can identify factual inconsistencies in paraphrasing and
summarization better than sentence-level methods or those based on question
generation, while additionally localizing the erroneous parts of the
generation.
Related papers
- SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation [55.61004653386632]
Large Language Models (LLMs) often produce hallucinations, i.e., information that is unfaithful or not grounded in the input context.
This paper introduces a novel self-supervised method for generating a training set of unfaithful samples.
We then refine the model using a training process that encourages the generation of grounded outputs over unfaithful ones.
arXiv Detail & Related papers (2025-02-19T12:31:58Z) - Text2Data: Low-Resource Data Generation with Textual Control [100.5970757736845]
Text2Data is a novel approach that utilizes unlabeled data to understand the underlying data distribution.
It undergoes finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z) - Learning to Filter Context for Retrieval-Augmented Generation [75.18946584853316]
Generation models are required to generate outputs given partially or entirely irrelevant passages.
FILCO identifies useful context based on lexical and information-theoretic approaches.
It trains context filtering models that can filter retrieved contexts at test time.
arXiv Detail & Related papers (2023-11-14T18:41:54Z) - Revisiting text decomposition methods for NLI-based factuality scoring
of summaries [9.044665059626958]
We show that fine-grained decomposition is not always a winning strategy for factuality scoring.
We also show that small changes to previously proposed entailment-based scoring methods can result in better performance.
arXiv Detail & Related papers (2022-11-30T09:54:37Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z) - Plot-guided Adversarial Example Construction for Evaluating Open-domain
Story Generation [23.646133241521614]
Learnable evaluation metrics have promised more accurate assessments by having higher correlations with human judgments.
Previous works relied on textitheuristically manipulated plausible examples to mimic possible system drawbacks.
We propose to tackle these issues by generating a more comprehensive set of implausible stories using em plots, which are structured representations of controllable factors used to generate stories.
arXiv Detail & Related papers (2021-04-12T20:19:24Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.