Related papers: RAGged Edges: The Double-Edged Sword of Retrieval-Augmented Chatbots

RAGged Edges: The Double-Edged Sword of Retrieval-Augmented Chatbots

URL: http://arxiv.org/abs/2403.01193v3
Date: Wed, 12 Jun 2024 12:00:52 GMT
Title: RAGged Edges: The Double-Edged Sword of Retrieval-Augmented Chatbots
Authors: Philip Feldman, James R. Foulds, Shimei Pan,
Abstract summary: ChatGPT's tendency to hallucinate -- generate plausible but false information -- poses a significant challenge. This paper explores how Retrieval-Augmented Generation can counter hallucinations by integrating external knowledge with prompts. Our results show that RAG increases accuracy in some cases, but can still be misled when prompts directly contradict the model's pre-trained understanding.
Score: 6.893551641325889
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) like ChatGPT demonstrate the remarkable progress of artificial intelligence. However, their tendency to hallucinate -- generate plausible but false information -- poses a significant challenge. This issue is critical, as seen in recent court cases where ChatGPT's use led to citations of non-existent legal rulings. This paper explores how Retrieval-Augmented Generation (RAG) can counter hallucinations by integrating external knowledge with prompts. We empirically evaluate RAG against standard LLMs using prompts designed to induce hallucinations. Our results show that RAG increases accuracy in some cases, but can still be misled when prompts directly contradict the model's pre-trained understanding. These findings highlight the complex nature of hallucinations and the need for more robust solutions to ensure LLM reliability in real-world applications. We offer practical recommendations for RAG deployment and discuss implications for the development of more trustworthy LLMs.

Related papers

Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs [69.10441885629787]
Retrieval-Augmented Generation (RAG) lifts the factuality of Large Language Models (LLMs) by injecting external knowledge.<n>It falls short on problems that demand multi-step inference; conversely, purely reasoning-oriented approaches often hallucinate or mis-ground facts.<n>This survey synthesizes both strands under a unified reasoning-retrieval perspective.
arXiv Detail & Related papers (2025-07-13T03:29:41Z)
Beyond Facts: Evaluating Intent Hallucination in Large Language Models [13.315302240710164]
FAITHQA is a novel benchmark for intent hallucination that contains 20,068 problems.<n>We find that intent hallucination is a common issue even for state-of-the-art models.<n>We introduce an automatic LLM generation evaluation metric, CONSTRAINT SCORE, for detecting intent hallucination.
arXiv Detail & Related papers (2025-06-06T21:10:55Z)
Removal of Hallucination on Hallucination: Debate-Augmented RAG [10.501398822864363]
Debate-Augmented RAG (DRAG) is a training-free framework that integrates Multi-Agent Debate (MAD) mechanisms into both retrieval and generation stages.<n>In retrieval, DRAG employs structured debates among proponents, opponents, and judges to refine retrieval quality and ensure factual reliability.<n>In generation, DRAG introduces asymmetric information roles and adversarial debates, enhancing reasoning robustness and mitigating factual inconsistencies.
arXiv Detail & Related papers (2025-05-24T08:15:22Z)
Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models [0.0]
Hallucinations in large language models (LLMs) present a growing challenge across real-world applications. We propose a prompt-based framework to systematically trigger and quantify hallucination.
arXiv Detail & Related papers (2025-05-01T14:33:47Z)
HalluLens: LLM Hallucination Benchmark [49.170128733508335]
Large language models (LLMs) often generate responses that deviate from user input or training data, a phenomenon known as "hallucination" This paper introduces a comprehensive hallucination benchmark, incorporating both new extrinsic and existing intrinsic evaluation tasks.
arXiv Detail & Related papers (2025-04-24T13:40:27Z)
Don't Let It Hallucinate: Premise Verification via Retrieval-Augmented Logical Reasoning [19.30729301157088]
We propose a retrieval-based framework that identifies and addresses false premises before generation. Experiments show that this approach effectively reduces hallucinations, improves factual accuracy, and does not require access to model logits or large-scale fine-tuning.
arXiv Detail & Related papers (2025-04-08T21:14:48Z)
The Illusionist's Prompt: Exposing the Factual Vulnerabilities of Large Language Models with Linguistic Nuances [23.908718176644634]
Large Language Models (LLMs) are increasingly relied upon as real-time sources of information by non-expert users. We introduce The Illusionist's Prompt, a novel hallucination attack that incorporates linguistic nuances into adversarial queries. Our attack automatically generates highly transferrable illusory prompts to induce internal factual errors, all while preserving user intent and semantics.
arXiv Detail & Related papers (2025-04-01T07:10:00Z)
DecoPrompt : Decoding Prompts Reduces Hallucinations when Large Language Models Meet False Premises [28.72485319617863]
We propose a new prompting algorithm, named DecoPrompt, to mitigate hallucination. DecoPrompt leverages LLMs to "decode" the false-premise prompts without really eliciting hallucination output from LLMs. We perform experiments on two datasets, demonstrating that DecoPrompt can reduce hallucinations effectively on outputs from different LLMs.
arXiv Detail & Related papers (2024-11-12T00:48:01Z)
Mitigating Entity-Level Hallucination in Large Language Models [11.872916697604278]
This paper proposes Dynamic Retrieval Augmentation based on hallucination Detection (DRAD) as a novel method to detect and mitigate hallucinations in Large Language Models (LLMs) Experiment results show that DRAD demonstrates superior performance in both detecting and mitigating hallucinations in LLMs.
arXiv Detail & Related papers (2024-07-12T16:47:34Z)
Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models [70.19081534515371]
Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks. They generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences. We propose a robust discriminator named RelD to effectively detect hallucination in LLMs' generated answers.
arXiv Detail & Related papers (2024-07-04T18:47:42Z)
Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models [68.91592125175787]
Hallucinations pose a significant challenge for the practical implementation of large language models (LLMs) We present Rowen, a novel approach that enhances LLMs with a selective retrieval augmentation process tailored to address hallucinations.
arXiv Detail & Related papers (2024-02-16T11:55:40Z)
RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models [9.465753274663061]
Retrieval-augmented generation (RAG) has become a main technique for alleviating hallucinations in large language models (LLMs) This paper presents RAGTruth, a corpus tailored for analyzing word-level hallucinations in various domains.
arXiv Detail & Related papers (2023-12-31T04:43:45Z)
Alleviating Hallucinations of Large Language Models through Induced Hallucinations [67.35512483340837]
Large language models (LLMs) have been observed to generate responses that include inaccurate or fabricated information. We propose a simple textitInduce-then-Contrast Decoding (ICD) strategy to alleviate hallucinations.
arXiv Detail & Related papers (2023-12-25T12:32:49Z)
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions [40.79317187623401]
The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP) LLMs are prone to hallucination, generating plausible yet nonfactual content. This phenomenon raises significant concerns over the reliability of LLMs in real-world information retrieval systems.
arXiv Detail & Related papers (2023-11-09T09:25:37Z)
Improving Factual Consistency of Text Summarization by Adversarially Decoupling Comprehension and Embellishment Abilities of LLMs [67.56087611675606]
Large language models (LLMs) generate summaries that are factually inconsistent with original articles. These hallucinations are challenging to detect through traditional methods. We propose an adversarially DEcoupling method to disentangle the abilities of LLMs (DECENT)
arXiv Detail & Related papers (2023-10-30T08:40:16Z)
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models [116.01843550398183]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks. LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge.
arXiv Detail & Related papers (2023-09-03T16:56:48Z)
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models [146.87696738011712]
Large language models (LLMs) are prone to generate hallucinations, i.e., content that conflicts with the source or cannot be verified by the factual knowledge. To understand what types of content and to which extent LLMs are apt to hallucinate, we introduce the Hallucination Evaluation benchmark for Large Language Models (HaluEval)
arXiv Detail & Related papers (2023-05-19T15:36:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.