"According to ...": Prompting Language Models Improves Quoting from
Pre-Training Data
- URL: http://arxiv.org/abs/2305.13252v2
- Date: Mon, 26 Feb 2024 20:50:33 GMT
- Title: "According to ...": Prompting Language Models Improves Quoting from
Pre-Training Data
- Authors: Orion Weller and Marc Marone and Nathaniel Weir and Dawn Lawrie and
Daniel Khashabi and Benjamin Van Durme
- Abstract summary: Large Language Models (LLMs) may hallucinate and generate fake information, despite pre-training on factual data.
We propose according-to prompting: directing LLMs to ground responses against previously observed text.
To quantify this grounding, we propose a novel evaluation metric (QUIP-Score) that measures the extent to which model-produced answers are directly found in underlying text corpora.
- Score: 52.03853726206584
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) may hallucinate and generate fake information,
despite pre-training on factual data. Inspired by the journalistic device of
"according to sources", we propose according-to prompting: directing LLMs to
ground responses against previously observed text. To quantify this grounding,
we propose a novel evaluation metric (QUIP-Score) that measures the extent to
which model-produced answers are directly found in underlying text corpora. We
illustrate with experiments on three corpora (Wikipedia, PubMed, and the U.S.
legal tax code) that these prompts improve grounding under our metrics, with
the additional benefit of often improving end-task performance. Furthermore,
prompts that ask the model to decrease grounding (or to ground to other
corpora) indeed decrease QUIP-Score, indicating the ability of LLMs to increase
or decrease grounded generations on request.
Related papers
- What do Large Language Models Need for Machine Translation Evaluation? [12.42394213466485]
Large language models (LLMs) can achieve results comparable to fine-tuned multilingual pre-trained language models.
This paper explores what translation information, such as the source, reference, translation errors and annotation guidelines, is needed for LLMs to evaluate machine translation quality.
arXiv Detail & Related papers (2024-10-04T09:50:45Z) - CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation [76.31621715032558]
Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses.
We introduce CaLM, a novel verification framework.
Our framework empowers smaller LMs, which rely less on parametric memory, to validate the output of larger LMs.
arXiv Detail & Related papers (2024-06-08T06:04:55Z) - Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts [20.933548500888595]
Using large language models (LLMs) for educational applications like dialogue-based teaching is a hot topic.
Current Static metrics for text difficulty, like the Flesch-Kincaid Reading Ease score, are known to be crude and brittle.
We introduce and evaluate a new set of Prompt-based metrics for text difficulty.
arXiv Detail & Related papers (2024-05-15T16:22:16Z) - RepEval: Effective Text Evaluation with LLM Representation [55.26340302485898]
RepEval is a metric that leverages the projection of Large Language Models (LLMs) representations for evaluation.
Our work underscores the richness of information regarding text quality embedded within LLM representations, offering insights for the development of new metrics.
arXiv Detail & Related papers (2024-04-30T13:50:55Z) - Groundedness in Retrieval-augmented Long-form Generation: An Empirical Study [61.74571814707054]
We evaluate whether every generated sentence is grounded in retrieved documents or the model's pre-training data.
Across 3 datasets and 4 model families, our findings reveal that a significant fraction of generated sentences are consistently ungrounded.
Our results show that while larger models tend to ground their outputs more effectively, a significant portion of correct answers remains compromised by hallucinations.
arXiv Detail & Related papers (2024-04-10T14:50:10Z) - A Glitch in the Matrix? Locating and Detecting Language Model Grounding with Fakepedia [57.31074448586854]
Large language models (LLMs) have an impressive ability to draw on novel information supplied in their context.
Yet the mechanisms underlying this contextual grounding remain unknown.
We present a novel method to study grounding abilities using Fakepedia.
arXiv Detail & Related papers (2023-12-04T17:35:42Z) - Effective Large Language Model Adaptation for Improved Grounding and Citation Generation [48.07830615309543]
This paper focuses on improving large language models (LLMs) by grounding their responses in retrieved passages and by providing citations.
We propose a new framework, AGREE, that improves the grounding from a holistic perspective.
Our framework tunes LLMs to selfground the claims in their responses and provide accurate citations to retrieved documents.
arXiv Detail & Related papers (2023-11-16T03:22:25Z) - Enabling Large Language Models to Generate Text with Citations [37.64884969997378]
Large language models (LLMs) have emerged as a widely-used tool for information seeking.
Our aim is to allow LLMs to generate text with citations, improving their factual correctness and verifiability.
We propose ALCE, the first benchmark for Automatic LLMs' Citation Evaluation.
arXiv Detail & Related papers (2023-05-24T01:53:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.