How Well Do Large Language Models Truly Ground?
- URL: http://arxiv.org/abs/2311.09069v2
- Date: Sat, 29 Jun 2024 18:07:34 GMT
- Title: How Well Do Large Language Models Truly Ground?
- Authors: Hyunji Lee, Sejune Joo, Chaeeun Kim, Joel Jang, Doyoung Kim, Kyoung-Woon On, Minjoon Seo,
- Abstract summary: A common method is to generate responses by grounding on external contexts given as input, known as knowledge-augmented models.
Previous research often narrowly defines "grounding" as just having the correct answer, which does not ensure the reliability of the entire response.
We propose a stricter definition of grounding: a model is truly grounded if it (1) fully utilizes the necessary knowledge from the provided context, and (2) stays within the limits of that knowledge.
- Score: 39.39062385290276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To reduce issues like hallucinations and lack of control in Large Language Models (LLMs), a common method is to generate responses by grounding on external contexts given as input, known as knowledge-augmented models. However, previous research often narrowly defines "grounding" as just having the correct answer, which does not ensure the reliability of the entire response. To overcome this, we propose a stricter definition of grounding: a model is truly grounded if it (1) fully utilizes the necessary knowledge from the provided context, and (2) stays within the limits of that knowledge. We introduce a new dataset and a grounding metric to evaluate model capability under the definition. We perform experiments across 25 LLMs of different sizes and training methods and provide insights into factors that influence grounding performance. Our findings contribute to a better understanding of how to improve grounding capabilities and suggest an area of improvement toward more reliable and controllable LLM applications.
Related papers
- Groundedness in Retrieval-augmented Long-form Generation: An Empirical Study [61.74571814707054]
We evaluate whether every generated sentence is grounded in retrieved documents or the model's pre-training data.
Across 3 datasets and 4 model families, our findings reveal that a significant fraction of generated sentences are consistently ungrounded.
Our results show that while larger models tend to ground their outputs more effectively, a significant portion of correct answers remains compromised by hallucinations.
arXiv Detail & Related papers (2024-04-10T14:50:10Z) - Grounding Gaps in Language Model Generations [67.79817087930678]
We study whether large language models generate text that reflects human grounding.
We find that -- compared to humans -- LLMs generate language with less conversational grounding.
To understand the roots of the identified grounding gap, we examine the role of instruction tuning and preference optimization.
arXiv Detail & Related papers (2023-11-15T17:40:27Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Probing Factually Grounded Content Transfer with Factual Ablation [68.78413677690321]
Grounded generation draws on a reliable external document (grounding) for factual information.
Measuring factuality is also simplified--to factual consistency, testing whether the generation agrees with the grounding, rather than all facts.
We study this problem for content transfer, in which generations extend a prompt, using information from factual grounding.
arXiv Detail & Related papers (2022-03-18T19:18:54Z) - LaMDA: Language Models for Dialog Applications [75.75051929981933]
LaMDA is a family of Transformer-based neural language models specialized for dialog.
Fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements.
arXiv Detail & Related papers (2022-01-20T15:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.