Fakes of Varying Shades: How Warning Affects Human Perception and Engagement Regarding LLM Hallucinations
- URL: http://arxiv.org/abs/2404.03745v3
- Date: Mon, 12 Aug 2024 14:13:15 GMT
- Title: Fakes of Varying Shades: How Warning Affects Human Perception and Engagement Regarding LLM Hallucinations
- Authors: Mahjabin Nahar, Haeseung Seo, Eun-Ju Lee, Aiping Xiong, Dongwon Lee,
- Abstract summary: This research aims to understand the human perception of hallucinations by systematically varying the degree of hallucination.
We observed that warning improved the detection of hallucination without significantly affecting the perceived truthfulness of genuine content.
- Score: 9.740345290187307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The widespread adoption and transformative effects of large language models (LLMs) have sparked concerns regarding their capacity to produce inaccurate and fictitious content, referred to as `hallucinations'. Given the potential risks associated with hallucinations, humans should be able to identify them. This research aims to understand the human perception of LLM hallucinations by systematically varying the degree of hallucination (genuine, minor hallucination, major hallucination) and examining its interaction with warning (i.e., a warning of potential inaccuracies: absent vs. present). Participants (N=419) from Prolific rated the perceived accuracy and engaged with content (e.g., like, dislike, share) in a Q/A format. Participants ranked content as truthful in the order of genuine, minor hallucination, and major hallucination, and user engagement behaviors mirrored this pattern. More importantly, we observed that warning improved the detection of hallucination without significantly affecting the perceived truthfulness of genuine content. We conclude by offering insights for future tools to aid human detection of hallucinations. All survey materials, demographic questions, and post-session questions are available at: https://github.com/MahjabinNahar/fakes-of-varying-shades-survey-materials
Related papers
- Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs [45.13670875211498]
Large Language Models (LLMs) often generate outputs that lack grounding in real-world facts, a phenomenon known as hallucinations.
We show that models can hallucinate with high certainty even when they have the correct knowledge.
arXiv Detail & Related papers (2025-02-18T15:46:31Z) - Can Your Uncertainty Scores Detect Hallucinated Entity? [14.432545893757677]
We propose a new data set, HalluEntity, which annotates hallucination at the entity level.
Based on the dataset, we evaluate uncertainty-based hallucination detection approaches across 17 modern LLMs.
Our experimental results show that uncertainty estimation approaches focusing on individual token probabilities tend to over-predict hallucinations.
arXiv Detail & Related papers (2025-02-17T16:01:41Z) - Valuable Hallucinations: Realizable Non-realistic Propositions [2.451326684641447]
This paper introduces the first formal definition of valuable hallucinations in large language models (LLMs)
We focus on the potential value that certain types of hallucinations can offer in specific contexts.
We present experiments using the Qwen2.5 model and HalluQA dataset, employing ReAct prompting to control and optimize hallucinations.
arXiv Detail & Related papers (2025-02-16T12:59:11Z) - On Large Language Models' Hallucination with Regard to Known Facts [74.96789694959894]
Large language models are successful in answering factoid questions but are also prone to hallucination.
We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics.
Our study shed light on understanding the reasons for LLMs' hallucinations on their known facts, and more importantly, on accurately predicting when they are hallucinating.
arXiv Detail & Related papers (2024-03-29T06:48:30Z) - Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models [35.45859414670449]
We introduce a refined taxonomy of hallucinations, featuring a new category: Event Hallucination.
We then utilize advanced LLMs to generate and filter fine grained hallucinatory data consisting of various types of hallucinations.
The proposed benchmark distinctively assesses LVLMs ability to tackle a broad spectrum of hallucinations.
arXiv Detail & Related papers (2024-02-24T05:14:52Z) - A Survey on Hallucination in Large Vision-Language Models [18.540878498840435]
Large Vision-Language Models (LVLMs) have attracted growing attention within the AI landscape for its practical implementation potential.
However, hallucination'', or more specifically, the misalignment between factual visual content and corresponding textual generation, poses a significant challenge of utilizing LVLMs.
We dissect LVLM-related hallucinations in an attempt to establish an overview and facilitate future mitigation.
arXiv Detail & Related papers (2024-02-01T00:33:21Z) - Fine-grained Hallucination Detection and Editing for Language Models [109.56911670376932]
Large language models (LMs) are prone to generate factual errors, which are often called hallucinations.
We introduce a comprehensive taxonomy of hallucinations and argue that hallucinations manifest in diverse forms.
We propose a novel task of automatic fine-grained hallucination detection and construct a new evaluation benchmark, FavaBench.
arXiv Detail & Related papers (2024-01-12T19:02:48Z) - The Dawn After the Dark: An Empirical Study on Factuality Hallucination
in Large Language Models [134.6697160940223]
hallucination poses great challenge to trustworthy and reliable deployment of large language models.
Three key questions should be well studied: how to detect hallucinations (detection), why do LLMs hallucinate (source), and what can be done to mitigate them.
This work presents a systematic empirical study on LLM hallucination, focused on the the three aspects of hallucination detection, source and mitigation.
arXiv Detail & Related papers (2024-01-06T12:40:45Z) - HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data [102.56792377624927]
hallucinations inherent in machine-generated data remain under-explored.
We present a novel hallucination detection and elimination framework, HalluciDoctor, based on the cross-checking paradigm.
Our method successfully mitigates 44.6% hallucinations relatively and maintains competitive performance compared to LLaVA.
arXiv Detail & Related papers (2023-11-22T04:52:58Z) - Evaluation and Analysis of Hallucination in Large Vision-Language Models [49.19829480199372]
Large Vision-Language Models (LVLMs) have recently achieved remarkable success.
LVLMs are still plagued by the hallucination problem.
Hallucination refers to the information of LVLMs' responses that does not exist in the visual input.
arXiv Detail & Related papers (2023-08-29T08:51:24Z) - HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large
Language Models [146.87696738011712]
Large language models (LLMs) are prone to generate hallucinations, i.e., content that conflicts with the source or cannot be verified by the factual knowledge.
To understand what types of content and to which extent LLMs are apt to hallucinate, we introduce the Hallucination Evaluation benchmark for Large Language Models (HaluEval)
arXiv Detail & Related papers (2023-05-19T15:36:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.