Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation
- URL: http://arxiv.org/abs/2406.05494v1
- Date: Sat, 8 Jun 2024 15:20:56 GMT
- Title: Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation
- Authors: Neeraj Varshney, Satyam Raj, Venkatesh Mishra, Agneet Chatterjee, Ritika Sarkar, Amir Saeidi, Chitta Baral,
- Abstract summary: Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks.
LLMs have been shown to suffer from a critical limitation pertinent to 'hallucination' in their output.
We study four tasks with negation: 'false premise completion', 'constrained fact generation','multiple choice question answering', and 'fact generation'
We show that open-source state-of-the-art LLMs such as LLaMA-2-chat, Vicuna, and Orca-2 hallucinate considerably on all these tasks involving negation.
- Score: 44.486880633185756
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks. However, they have been shown to suffer from a critical limitation pertinent to 'hallucination' in their output. Recent research has focused on investigating and addressing this problem for a variety of tasks such as biography generation, question answering, abstractive summarization, and dialogue generation. However, the crucial aspect pertaining to 'negation' has remained considerably underexplored. Negation is important because it adds depth and nuance to the understanding of language and is also crucial for logical reasoning and inference. In this work, we address the above limitation and particularly focus on studying the impact of negation in LLM hallucinations. Specifically, we study four tasks with negation: 'false premise completion', 'constrained fact generation', 'multiple choice question answering', and 'fact generation'. We show that open-source state-of-the-art LLMs such as LLaMA-2-chat, Vicuna, and Orca-2 hallucinate considerably on all these tasks involving negation which underlines a critical shortcoming of these models. Addressing this problem, we further study numerous strategies to mitigate these hallucinations and demonstrate their impact.
Related papers
- Negation Blindness in Large Language Models: Unveiling the NO Syndrome in Image Generation [63.064204206220936]
Foundational Large Language Models (LLMs) have changed the way we perceive technology.
They have been shown to excel in tasks ranging from poem writing to coding to essay generation and puzzle solving.
With the incorporation of image generation capability, they have become more comprehensive and versatile AI tools.
Currently identified flaws include hallucination, biases, and bypassing restricted commands to generate harmful content.
arXiv Detail & Related papers (2024-08-27T14:40:16Z) - Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models [70.19081534515371]
Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks.
They generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences.
We propose a robust discriminator named RelD to effectively detect hallucination in LLMs' generated answers.
arXiv Detail & Related papers (2024-07-04T18:47:42Z) - Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? [53.89380284760555]
Large vision-language models (LVLMs) produce captions that mention concepts that cannot be found in the image.
These hallucinations erode the trustworthiness of LVLMs and are arguably among the main obstacles to their ubiquitous adoption.
Recent work suggests that addition of grounding objectives -- those that explicitly align image regions or objects to text spans -- reduces the amount of LVLM hallucination.
arXiv Detail & Related papers (2024-06-20T16:56:11Z) - Hallucination Detection and Hallucination Mitigation: An Investigation [13.941799495842776]
Large language models (LLMs) have achieved remarkable successes over the last two years in a range of different applications.
This report aims to present a comprehensive review of the current literature on both hallucination detection and hallucination mitigation.
arXiv Detail & Related papers (2024-01-16T13:36:07Z) - A Comprehensive Survey of Hallucination Mitigation Techniques in Large
Language Models [7.705767540805267]
Large Language Models (LLMs) continue to advance in their ability to write human-like text.
A key challenge remains around their tendency to hallucinate generating content that appears factual but is ungrounded.
This paper presents a survey of over 32 techniques developed to mitigate hallucination in LLMs.
arXiv Detail & Related papers (2024-01-02T17:56:30Z) - A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions [40.79317187623401]
The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP)
LLMs are prone to hallucination, generating plausible yet nonfactual content.
This phenomenon raises significant concerns over the reliability of LLMs in real-world information retrieval systems.
arXiv Detail & Related papers (2023-11-09T09:25:37Z) - Towards Mitigating Hallucination in Large Language Models via
Self-Reflection [63.2543947174318]
Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks.
This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets.
arXiv Detail & Related papers (2023-10-10T03:05:44Z) - Exploring the Relationship between LLM Hallucinations and Prompt
Linguistic Nuances: Readability, Formality, and Concreteness [6.009751153269125]
We examine how linguistic factors in prompts, specifically readability, formality, and concreteness, influence the occurrence of hallucinations.
Our experimental results suggest that prompts characterized by greater formality and concreteness tend to result in reduced hallucination.
arXiv Detail & Related papers (2023-09-20T05:04:16Z) - Siren's Song in the AI Ocean: A Survey on Hallucination in Large
Language Models [116.01843550398183]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks.
LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge.
arXiv Detail & Related papers (2023-09-03T16:56:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.