SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models
- URL: http://arxiv.org/abs/2502.01812v1
- Date: Mon, 03 Feb 2025 20:42:32 GMT
- Title: SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models
- Authors: Diyana Muhammed, Gollam Rabby, Sören Auer,
- Abstract summary: SelfCheckAgent is a novel framework integrating three different agents.<n>These agents provide a robust multi-dimensional approach to hallucination detection.<n>The framework also incorporates a triangulation strategy, which increases the strengths of the SelfCheckAgent.
- Score: 0.16385815610837165
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting hallucinations in Large Language Models (LLMs) remains a critical challenge for their reliable deployment in real-world applications. To address this, we introduce SelfCheckAgent, a novel framework integrating three different agents: the Symbolic Agent, the Specialized Detection Agent, and the Contextual Consistency Agent. These agents provide a robust multi-dimensional approach to hallucination detection. Notable results include the Contextual Consistency Agent leveraging Llama 3.1 with Chain-of-Thought (CoT) to achieve outstanding performance on the WikiBio dataset, with NonFactual hallucination detection scoring 93.64%, Factual 70.26%, and Ranking 78.48% respectively. On the AIME dataset, GPT-4o with CoT excels in NonFactual detection with 94.89% but reveals trade-offs in Factual with 30.58% and Ranking with 30.68%, underscoring the complexity of hallucination detection in the complex mathematical domains. The framework also incorporates a triangulation strategy, which increases the strengths of the SelfCheckAgent, yielding significant improvements in real-world hallucination identification. The comparative analysis demonstrates SelfCheckAgent's applicability across diverse domains, positioning it as a crucial advancement for trustworthy LLMs. These findings highlight the potentiality of consistency-driven methodologies in detecting hallucinations in LLMs.
Related papers
- ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs [50.18087419133284]
hallucination detection methods leveraging hidden states predominantly focus on static and isolated representations.<n>We introduce a novel metric, the ICR Score, which quantifies the contribution of modules to the hidden states' update.<n>We propose a hallucination detection method, the ICR Probe, which captures the cross-layer evolution of hidden states.
arXiv Detail & Related papers (2025-07-22T11:44:26Z) - Learning Auxiliary Tasks Improves Reference-Free Hallucination Detection in Open-Domain Long-Form Generation [78.78421340836915]
We systematically investigate reference-free hallucination detection in open-domain long-form responses.<n>Our findings reveal that internal states are insufficient for reliably distinguishing between factual and hallucinated content.<n>We introduce a new paradigm, named RATE-FT, that augments fine-tuning with an auxiliary task for the model to jointly learn with the main task of hallucination detection.
arXiv Detail & Related papers (2025-05-18T07:10:03Z) - SINdex: Semantic INconsistency Index for Hallucination Detection in LLMs [2.805517909463769]
Large language models (LLMs) are increasingly deployed across diverse domains, yet they are prone to generating factually incorrect outputs.
We introduce a novel and scalable uncertainty-based semantic clustering framework for automated hallucination detection.
arXiv Detail & Related papers (2025-03-07T23:25:19Z) - SAFE: A Sparse Autoencoder-Based Framework for Robust Query Enrichment and Hallucination Mitigation in LLMs [12.12344696058016]
We propose SAFE, a novel method for detecting and mitigating hallucinations by leveraging Sparse Autoencoders (SAEs)
Empirical results demonstrate that SAFE consistently improves query generation accuracy and mitigates hallucinations across all datasets, achieving accuracy improvements of up to 29.45%.
arXiv Detail & Related papers (2025-03-04T22:19:52Z) - Good Parenting is all you need -- Multi-agentic LLM Hallucination Mitigation [0.0]
This study explores the ability of Large Language Model (LLM) agents to detect and correct hallucinations in AI-generated content.
Across 4,900 test runs involving various combinations of primary and reviewing agents, advanced AI models demonstrated near-perfect accuracy in identifying hallucinations.
arXiv Detail & Related papers (2024-10-18T08:18:18Z) - Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation [96.78845113346809]
Retrieval-augmented language models (RALMs) have shown strong performance and wide applicability in knowledge-intensive tasks.
This paper proposes SynCheck, a lightweight monitor that leverages fine-grained decoding dynamics to detect unfaithful sentences.
We also introduce FOD, a faithfulness-oriented decoding algorithm guided by beam search for long-form retrieval-augmented generation.
arXiv Detail & Related papers (2024-06-19T16:42:57Z) - Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector [114.88975874411142]
Hallucination detection is a challenging task for large language models (LLMs)
We propose an autonomous LLM-based agent framework, called HaluAgent.
In HaluAgent, we integrate the LLM, multi-functional toolbox, and design a fine-grained three-stage detection framework.
arXiv Detail & Related papers (2024-06-17T07:30:05Z) - Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models [56.00992369295851]
Open-sourced Large Language Models (LLMs) have achieved great success in various NLP tasks, however, they are still far inferior to API-based models when acting as agents.
This paper delivers three key observations: (1) the current agent training corpus is entangled with both formats following and agent reasoning, which significantly shifts from the distribution of its pre-training data; (2) LLMs exhibit different learning speeds on the capabilities required by agent tasks; and (3) current approaches have side-effects when improving agent abilities by introducing hallucinations.
We propose Agent-FLAN to effectively Fine-tune LANguage models for Agents.
arXiv Detail & Related papers (2024-03-19T16:26:10Z) - HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination
Tendency of LLMs [0.0]
Hallucinations pose a significant challenge to the reliability and alignment of Large Language Models (LLMs)
This paper introduces an automated scalable framework that combines benchmarking LLMs' hallucination tendencies with efficient hallucination detection.
The framework is domain-agnostic, allowing the use of any language model for benchmark creation or evaluation in any domain.
arXiv Detail & Related papers (2024-02-25T22:23:37Z) - "Knowing When You Don't Know": A Multilingual Relevance Assessment Dataset for Robust Retrieval-Augmented Generation [90.09260023184932]
Retrieval-Augmented Generation (RAG) grounds Large Language Model (LLM) output by leveraging external knowledge sources to reduce factual hallucinations.
NoMIRACL is a human-annotated dataset for evaluating LLM robustness in RAG across 18 typologically diverse languages.
We measure relevance assessment using: (i) hallucination rate, measuring model tendency to hallucinate, when the answer is not present in passages in the non-relevant subset, and (ii) error rate, measuring model inaccuracy to recognize relevant passages in the relevant subset.
arXiv Detail & Related papers (2023-12-18T17:18:04Z) - Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus [99.33091772494751]
Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields.
LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations.
We propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs.
arXiv Detail & Related papers (2023-11-22T08:39:17Z) - Weakly Supervised Veracity Classification with LLM-Predicted Credibility Signals [4.895830603263421]
Pastel is a weakly supervised approach that leverages large language models to extract credibility signals from web content.
We study the association between credibility signals and veracity, and perform a study showing the impact of each signal on model performance.
arXiv Detail & Related papers (2023-09-14T11:06:51Z) - A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of
LLMs by Validating Low-Confidence Generation [76.34411067299331]
Large language models often tend to 'hallucinate' which critically hampers their reliability.
We propose an approach that actively detects and mitigates hallucinations during the generation process.
We show that the proposed active detection and mitigation approach successfully reduces the hallucinations of the GPT-3.5 model from 47.5% to 14.5% on average.
arXiv Detail & Related papers (2023-07-08T14:25:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.