On the Origin of Hallucinations in Conversational Models: Is it the
Datasets or the Models?
- URL: http://arxiv.org/abs/2204.07931v1
- Date: Sun, 17 Apr 2022 05:15:24 GMT
- Title: On the Origin of Hallucinations in Conversational Models: Is it the
Datasets or the Models?
- Authors: Nouha Dziri, Sivan Milton, Mo Yu, Osmar Zaiane, Siva Reddy
- Abstract summary: We conduct a study on existing knowledge-grounded conversational benchmarks and several state-of-the-art models.
Standard benchmarks consist of >60% hallucinated responses, leading to models that not only hallucinate but even amplify hallucinations.
Our findings raise important questions on the quality of existing datasets and models trained using them.
- Score: 32.41234580068662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge-grounded conversational models are known to suffer from producing
factually invalid statements, a phenomenon commonly called hallucination. In
this work, we investigate the underlying causes of this phenomenon: is
hallucination due to the training data, or to the models? We conduct a
comprehensive human study on both existing knowledge-grounded conversational
benchmarks and several state-of-the-art models. Our study reveals that the
standard benchmarks consist of >60% hallucinated responses, leading to models
that not only hallucinate but even amplify hallucinations. Our findings raise
important questions on the quality of existing datasets and models trained
using them. We make our annotations publicly available for future research.
Related papers
- Distinguishing Ignorance from Error in LLM Hallucinations [43.62904897907926]
We focus on close-book Question Answering (CBQA), where previous work has not fully addressed the distinction between two possible kinds of hallucinations.
We argue that distinguishing these cases is crucial for detecting and mitigating hallucinations.
arXiv Detail & Related papers (2024-10-29T14:31:33Z) - Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models [65.32990889402927]
We coin this phenomenon as knowledge overshadowing''
We show that the hallucination rate grows with both the imbalance ratio and the length of dominant condition description.
We propose to utilize overshadowing conditions as a signal to catch hallucination before it is produced.
arXiv Detail & Related papers (2024-07-10T20:37:42Z) - VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models [59.05674402770661]
This work introduces VideoHallucer, the first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)
VideoHallucer categorizes hallucinations into two main types: intrinsic and extrinsic, offering further subcategories for detailed analysis.
arXiv Detail & Related papers (2024-06-24T06:21:59Z) - On Large Language Models' Hallucination with Regard to Known Facts [74.96789694959894]
Large language models are successful in answering factoid questions but are also prone to hallucination.
We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics.
Our study shed light on understanding the reasons for LLMs' hallucinations on their known facts, and more importantly, on accurately predicting when they are hallucinating.
arXiv Detail & Related papers (2024-03-29T06:48:30Z) - Unfamiliar Finetuning Examples Control How Language Models Hallucinate [75.03210107477157]
Large language models are known to hallucinate when faced with unfamiliar queries.
We find that unfamiliar examples in the models' finetuning data are crucial in shaping these errors.
Our work further investigates RL finetuning strategies for improving the factuality of long-form model generations.
arXiv Detail & Related papers (2024-03-08T18:28:13Z) - Hallucinations in Neural Automatic Speech Recognition: Identifying
Errors and Hallucinatory Models [11.492702369437785]
Hallucinations are semantically unrelated to the source utterance, yet still fluent and coherent.
We show that commonly used metrics, such as word error rates, cannot differentiate between hallucinatory and non-hallucinatory models.
We devise a framework for identifying hallucinations by analysing their semantic connection with the ground truth and their fluency.
arXiv Detail & Related papers (2024-01-03T06:56:56Z) - HALO: An Ontology for Representing and Categorizing Hallucinations in Large Language Models [2.9312156642007294]
Hallucination Ontology (HALO) is written in OWL and supports six different types of hallucinations known to arise in large language models (LLMs)
We publish a dataset containing hallucinations that we inductively gathered across multiple independent Web sources, and show that HALO can be successfully used to model this dataset and answer competency questions.
arXiv Detail & Related papers (2023-12-08T17:57:20Z) - Evaluating Hallucinations in Chinese Large Language Models [65.4771562909392]
We establish a benchmark named HalluQA (Chinese Hallucination Question-Answering) to measure the hallucination phenomenon in Chinese large language models.
We consider two types of hallucinations: imitative falsehoods and factual errors, and we construct adversarial samples based on GLM-130B and ChatGPT.
For evaluation, we design an automated evaluation method using GPT-4 to judge whether a model output is hallucinated.
arXiv Detail & Related papers (2023-10-05T07:57:09Z) - AutoHall: Automated Hallucination Dataset Generation for Large Language Models [56.92068213969036]
This paper introduces a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets called AutoHall.
We also propose a zero-resource and black-box hallucination detection method based on self-contradiction.
arXiv Detail & Related papers (2023-09-30T05:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.