LLM Lies: Hallucinations are not Bugs, but Features as Adversarial
Examples
- URL: http://arxiv.org/abs/2310.01469v2
- Date: Wed, 4 Oct 2023 17:53:49 GMT
- Title: LLM Lies: Hallucinations are not Bugs, but Features as Adversarial
Examples
- Authors: Jia-Yu Yao, Kun-Peng Ning, Zhen-Hui Liu, Mu-Nan Ning, Li Yuan
- Abstract summary: We show that non-sense prompts composed of random tokens can also elicit the LLMs to respond with hallucinations.
This phenomenon forces us to revisit that hallucination may be another view of adversarial examples.
We formalize an automatic hallucination triggering method as the hallucination attack in an adversarial way.
- Score: 15.528923770249774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs), including GPT-3.5, LLaMA, and PaLM, seem to be
knowledgeable and able to adapt to many tasks. However, we still can not
completely trust their answer, since LLMs suffer from
hallucination--fabricating non-existent facts to cheat users without
perception. And the reasons for their existence and pervasiveness remain
unclear. In this paper, we demonstrate that non-sense prompts composed of
random tokens can also elicit the LLMs to respond with hallucinations. This
phenomenon forces us to revisit that hallucination may be another view of
adversarial examples, and it shares similar features with conventional
adversarial examples as the basic feature of LLMs. Therefore, we formalize an
automatic hallucination triggering method as the hallucination attack in an
adversarial way. Finally, we explore basic feature of attacked adversarial
prompts and propose a simple yet effective defense strategy. Our code is
released on GitHub.
Related papers
- DecoPrompt : Decoding Prompts Reduces Hallucinations when Large Language Models Meet False Premises [28.72485319617863]
We propose a new prompting algorithm, named DecoPrompt, to mitigate hallucination.
DecoPrompt leverages LLMs to "decode" the false-premise prompts without really eliciting hallucination output from LLMs.
We perform experiments on two datasets, demonstrating that DecoPrompt can reduce hallucinations effectively on outputs from different LLMs.
arXiv Detail & Related papers (2024-11-12T00:48:01Z) - Interpreting and Mitigating Hallucination in MLLMs through Multi-agent Debate [34.17353224636788]
We argue that hallucination in MLLMs is partially due to a lack of slow-thinking and divergent-thinking in these models.
Our approach can not only hallucinations but also interpret why they occur and detail the specifics of hallucination.
arXiv Detail & Related papers (2024-07-30T02:41:32Z) - Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? [53.89380284760555]
Large vision-language models (LVLMs) produce captions that mention concepts that cannot be found in the image.
These hallucinations erode the trustworthiness of LVLMs and are arguably among the main obstacles to their ubiquitous adoption.
Recent work suggests that addition of grounding objectives -- those that explicitly align image regions or objects to text spans -- reduces the amount of LVLM hallucination.
arXiv Detail & Related papers (2024-06-20T16:56:11Z) - Whispers that Shake Foundations: Analyzing and Mitigating False Premise
Hallucinations in Large Language Models [20.025123325871835]
Large Language Models (LLMs) generate hallucinated text when confronted with false premise questions.
We propose textbfFAITH (textbfFalse premise textbfAttention head constratextbfIining for mitextbfTigating textbfHallucinations), a novel and effective method to mitigate false premise hallucinations.
arXiv Detail & Related papers (2024-02-29T12:35:45Z) - Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models [36.41071419735876]
We identify a semantic shift bias related to paragraph breaks (nn) in large vision-language models (LVLMs)
This bias leads the model to infer that the contents following 'nn' should be obviously different from the preceding contents with less hallucinatory descriptions.
We find that deliberately inserting 'nn' at the generated description can induce more hallucinations.
arXiv Detail & Related papers (2024-02-02T12:02:46Z) - The Dawn After the Dark: An Empirical Study on Factuality Hallucination
in Large Language Models [134.6697160940223]
hallucination poses great challenge to trustworthy and reliable deployment of large language models.
Three key questions should be well studied: how to detect hallucinations (detection), why do LLMs hallucinate (source), and what can be done to mitigate them.
This work presents a systematic empirical study on LLM hallucination, focused on the the three aspects of hallucination detection, source and mitigation.
arXiv Detail & Related papers (2024-01-06T12:40:45Z) - Alleviating Hallucinations of Large Language Models through Induced
Hallucinations [67.35512483340837]
Large language models (LLMs) have been observed to generate responses that include inaccurate or fabricated information.
We propose a simple textitInduce-then-Contrast Decoding (ICD) strategy to alleviate hallucinations.
arXiv Detail & Related papers (2023-12-25T12:32:49Z) - OPERA: Alleviating Hallucination in Multi-Modal Large Language Models
via Over-Trust Penalty and Retrospection-Allocation [124.9008419182485]
We present OPERA, a novel MLLM decoding method grounded in an Over-trust Penalty and a Retrospection-Allocation strategy.
Our approach begins with an interesting observation that, most hallucinations are closely tied to the knowledge aggregation patterns in the self-attention matrix.
Based on the observation, OPERA introduces a penalty term on the model logits during the beam-search decoding to mitigate the over-trust issue.
arXiv Detail & Related papers (2023-11-29T18:57:07Z) - HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large
Language Models [146.87696738011712]
Large language models (LLMs) are prone to generate hallucinations, i.e., content that conflicts with the source or cannot be verified by the factual knowledge.
To understand what types of content and to which extent LLMs are apt to hallucinate, we introduce the Hallucination Evaluation benchmark for Large Language Models (HaluEval)
arXiv Detail & Related papers (2023-05-19T15:36:27Z) - Evaluating Object Hallucination in Large Vision-Language Models [122.40337582958453]
This work presents the first systematic study on object hallucination of large vision-language models (LVLMs)
We find that LVLMs tend to generate objects that are inconsistent with the target images in the descriptions.
We propose a polling-based query method called POPE to evaluate the object hallucination.
arXiv Detail & Related papers (2023-05-17T16:34:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.