Probing Factually Grounded Content Transfer with Factual Ablation
- URL: http://arxiv.org/abs/2203.10133v1
- Date: Fri, 18 Mar 2022 19:18:54 GMT
- Title: Probing Factually Grounded Content Transfer with Factual Ablation
- Authors: Peter West, Chris Quirk, Michel Galley, Yejin Choi
- Abstract summary: Grounded generation draws on a reliable external document (grounding) for factual information.
Measuring factuality is also simplified--to factual consistency, testing whether the generation agrees with the grounding, rather than all facts.
We study this problem for content transfer, in which generations extend a prompt, using information from factual grounding.
- Score: 68.78413677690321
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite recent success, large neural models often generate factually
incorrect text. Compounding this is the lack of a standard automatic evaluation
for factuality--it cannot be meaningfully improved if it cannot be measured.
Grounded generation promises a path to solving both of these problems: models
draw on a reliable external document (grounding) for factual information,
simplifying the challenge of factuality. Measuring factuality is also
simplified--to factual consistency, testing whether the generation agrees with
the grounding, rather than all facts. Yet, without a standard automatic metric
for factual consistency, factually grounded generation remains an open problem.
We study this problem for content transfer, in which generations extend a
prompt, using information from factual grounding. Particularly, this domain
allows us to introduce the notion of factual ablation for automatically
measuring factual consistency: this captures the intuition that the model
should be less likely to produce an output given a less relevant grounding
document. In practice, we measure this by presenting a model with two grounding
documents, and the model should prefer to use the more factually relevant one.
We contribute two evaluation sets to measure this. Applying our new evaluation,
we propose multiple novel methods improving over strong baselines.
Related papers
- Groundedness in Retrieval-augmented Long-form Generation: An Empirical Study [61.74571814707054]
We evaluate whether every generated sentence is grounded in retrieved documents or the model's pre-training data.
Across 3 datasets and 4 model families, our findings reveal that a significant fraction of generated sentences are consistently ungrounded.
Our results show that while larger models tend to ground their outputs more effectively, a significant portion of correct answers remains compromised by hallucinations.
arXiv Detail & Related papers (2024-04-10T14:50:10Z) - Know When To Stop: A Study of Semantic Drift in Text Generation [9.76171773410722]
We show that modern LLMs tend to generate correct facts first, then "drift away" and generate incorrect facts later.
This correct-then-incorrect generation pattern suggests that factual accuracy can be improved by knowing when to stop generation.
arXiv Detail & Related papers (2024-04-08T11:25:30Z) - How Well Do Large Language Models Truly Ground? [39.39062385290276]
A common method is to generate responses by grounding on external contexts given as input, known as knowledge-augmented models.
Previous research often narrowly defines "grounding" as just having the correct answer, which does not ensure the reliability of the entire response.
We propose a stricter definition of grounding: a model is truly grounded if it (1) fully utilizes the necessary knowledge from the provided context, and (2) stays within the limits of that knowledge.
arXiv Detail & Related papers (2023-11-15T16:11:27Z) - "According to ...": Prompting Language Models Improves Quoting from
Pre-Training Data [52.03853726206584]
Large Language Models (LLMs) may hallucinate and generate fake information, despite pre-training on factual data.
We propose according-to prompting: directing LLMs to ground responses against previously observed text.
To quantify this grounding, we propose a novel evaluation metric (QUIP-Score) that measures the extent to which model-produced answers are directly found in underlying text corpora.
arXiv Detail & Related papers (2023-05-22T17:25:24Z) - Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation [92.1582872870226]
We propose a new grounded keys-to-text generation task.
The task is to generate a factual description about an entity given a set of guiding keys, and grounding passages.
Inspired by recent QA-based evaluation measures, we propose an automatic metric, MAFE, for factual correctness of generated descriptions.
arXiv Detail & Related papers (2022-12-04T23:59:41Z) - FactPEGASUS: Factuality-Aware Pre-training and Fine-tuning for
Abstractive Summarization [91.46015013816083]
We present FactPEG, an abstractive summarization model that addresses the problem of factuality during pre-training and fine-tuning.
Our analysis suggests FactPEG is more factual than using the original pre-training objective in zero-shot and fewshot settings.
arXiv Detail & Related papers (2022-05-16T17:39:14Z) - Evaluating Factuality in Generation with Dependency-level Entailment [57.5316011554622]
We propose a new formulation of entailment that decomposes it at the level of dependency arcs.
We show that our dependency arc entailment model trained on this data can identify factual inconsistencies in paraphrasing and summarization better than sentence-level methods.
arXiv Detail & Related papers (2020-10-12T06:43:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.