Know When To Stop: A Study of Semantic Drift in Text Generation
- URL: http://arxiv.org/abs/2404.05411v1
- Date: Mon, 8 Apr 2024 11:25:30 GMT
- Title: Know When To Stop: A Study of Semantic Drift in Text Generation
- Authors: Ava Spataru, Eric Hambro, Elena Voita, Nicola Cancedda,
- Abstract summary: We show that modern LLMs tend to generate correct facts first, then "drift away" and generate incorrect facts later.
This correct-then-incorrect generation pattern suggests that factual accuracy can be improved by knowing when to stop generation.
- Score: 9.76171773410722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we explicitly show that modern LLMs tend to generate correct facts first, then "drift away" and generate incorrect facts later: this was occasionally observed but never properly measured. We develop a semantic drift score that measures the degree of separation between correct and incorrect facts in generated texts and confirm our hypothesis when generating Wikipedia-style biographies. This correct-then-incorrect generation pattern suggests that factual accuracy can be improved by knowing when to stop generation. Therefore, we explore the trade-off between information quantity and factual accuracy for several early stopping methods and manage to improve factuality by a large margin. We further show that reranking with semantic similarity can further improve these results, both compared to the baseline and when combined with early stopping. Finally, we try calling external API to bring the model back to the right generation path, but do not get positive results. Overall, our methods generalize and can be applied to any long-form text generation to produce more reliable information, by balancing trade-offs between factual accuracy, information quantity and computational cost.
Related papers
- FactAlign: Long-form Factuality Alignment of Large Language Models [35.067998820937284]
Large language models have demonstrated significant potential as the next-generation information access engines.
We propose FactAlign, a novel alignment framework designed to enhance the factuality of long-form responses.
Our experiments on open-domain prompts and information-seeking questions demonstrate that FactAlign significantly improves the factual accuracy of LLM responses.
arXiv Detail & Related papers (2024-10-02T16:03:13Z) - FactGenius: Combining Zero-Shot Prompting and Fuzzy Relation Mining to Improve Fact Verification with Knowledge Graphs [0.0]
We present FactGenius, a novel method that enhances fact-checking by combining zero-shot prompting of large language models with fuzzy text matching on knowledge graphs.
The evaluation of FactGenius on the FactKG, a benchmark dataset for fact verification, demonstrates that it significantly outperforms existing baselines.
arXiv Detail & Related papers (2024-06-03T13:24:37Z) - Deductive Closure Training of Language Models for Coherence, Accuracy, and Updatability [58.582216812183496]
Language models (LMs) can sometimes generate factually correct text and estimate truth values of individual claims.
Current LMs generate incorrect or nonsensical content, and are difficult to edit and bring up to date.
We present a method called Deductive Closure Training (DCT) that uses LMs themselves to identify implications of (and contradictions within) the text that they generate.
arXiv Detail & Related papers (2024-01-16T18:58:37Z) - Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines.
Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations'
In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z) - Mitigating Temporal Misalignment by Discarding Outdated Facts [58.620269228776294]
Large language models are often used under temporal misalignment, tasked with answering questions about the present.
We propose fact duration prediction: the task of predicting how long a given fact will remain true.
Our data and code are released publicly at https://github.com/mikejqzhang/mitigating_misalignment.
arXiv Detail & Related papers (2023-05-24T07:30:08Z) - Factuality Enhanced Language Models for Open-Ended Text Generation [60.27166549575472]
We design the FactualityPrompts test set and metrics to measure the factuality of LM generations.
We find that larger LMs are more factual than smaller ones, although a previous study suggests that larger LMs can be less truthful in terms of misconceptions.
We propose a factuality-enhanced training method that uses TopicPrefix for better awareness of facts and sentence completion.
arXiv Detail & Related papers (2022-06-09T17:16:43Z) - Probing Factually Grounded Content Transfer with Factual Ablation [68.78413677690321]
Grounded generation draws on a reliable external document (grounding) for factual information.
Measuring factuality is also simplified--to factual consistency, testing whether the generation agrees with the grounding, rather than all facts.
We study this problem for content transfer, in which generations extend a prompt, using information from factual grounding.
arXiv Detail & Related papers (2022-03-18T19:18:54Z) - Evaluating Factuality in Generation with Dependency-level Entailment [57.5316011554622]
We propose a new formulation of entailment that decomposes it at the level of dependency arcs.
We show that our dependency arc entailment model trained on this data can identify factual inconsistencies in paraphrasing and summarization better than sentence-level methods.
arXiv Detail & Related papers (2020-10-12T06:43:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.