Related papers: SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency

SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency

URL: http://arxiv.org/abs/2311.01740v2
Date: Sun, 18 Feb 2024 06:13:47 GMT
Title: SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency
Authors: Jiaxin Zhang, Zhuohang Li, Kamalika Das, Bradley A. Malin, Sricharan Kumar
Abstract summary: Hallucination detection is a critical step toward understanding the trustworthiness of modern language models (LMs) We re-examine existing detection approaches based on the self-consistency of LMs and uncover two types of hallucinations resulting from 1) question-level and 2) model-level. We propose a novel sampling-based method, i.e., semantic-aware cross-check consistency (SAC3) that expands on the principle of self-consistency checking.
Score: 11.056236593022978
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hallucination detection is a critical step toward understanding the trustworthiness of modern language models (LMs). To achieve this goal, we re-examine existing detection approaches based on the self-consistency of LMs and uncover two types of hallucinations resulting from 1) question-level and 2) model-level, which cannot be effectively identified through self-consistency check alone. Building upon this discovery, we propose a novel sampling-based method, i.e., semantic-aware cross-check consistency (SAC3) that expands on the principle of self-consistency checking. Our SAC3 approach incorporates additional mechanisms to detect both question-level and model-level hallucinations by leveraging advances including semantically equivalent question perturbation and cross-model response consistency checking. Through extensive and systematic empirical analysis, we demonstrate that SAC3 outperforms the state of the art in detecting both non-factual and factual statements across multiple question-answering and open-domain generation benchmarks.

Related papers

A Survey of Multimodal Hallucination Evaluation and Detection [52.03164192840023]
Multi-modal Large Language Models (MLLMs) have emerged as a powerful paradigm for integrating visual and textual information.<n>These models often suffer from hallucination, producing content that appears plausible but contradicts the input content or established world knowledge.<n>This survey offers an in-depth review of hallucination evaluation benchmarks and detection methods across Image-to-Text (I2T) and Text-to-image (T2I) generation tasks.
arXiv Detail & Related papers (2025-07-25T07:22:42Z)
ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs [50.18087419133284]
hallucination detection methods leveraging hidden states predominantly focus on static and isolated representations.<n>We introduce a novel metric, the ICR Score, which quantifies the contribution of modules to the hidden states' update.<n>We propose a hallucination detection method, the ICR Probe, which captures the cross-layer evolution of hidden states.
arXiv Detail & Related papers (2025-07-22T11:44:26Z)
Token Level Hallucination Detection via Variance in Language Models [0.0]
Large Language Models (LLMs) have demonstrated impressive generative capabilities across diverse tasks but remain susceptible to hallucinations.<n>We introduce a reference-free, token-level hallucination detection framework that leverages the variance in token log-probabilities across multiple generations.<n>Our approach is model-agnostic, interpretable, and suited for real-time or post-hoc analysis.
arXiv Detail & Related papers (2025-07-05T19:20:59Z)
Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models [22.43132625619281]
We propose KIE-HVQA, the first benchmark dedicated to evaluating OCR hallucination in degraded document understanding.<n>This dataset includes test samples spanning identity cards and invoices, with simulated real-world degradations for OCR reliability.<n>Experiments on Qwen2.5-VL demonstrate that our 7B- parameter model achieves a 22% absolute improvement in hallucination-free accuracy over GPT-4o.
arXiv Detail & Related papers (2025-06-25T06:44:07Z)
CLATTER: Comprehensive Entailment Reasoning for Hallucination Detection [60.98964268961243]
We propose that guiding models to perform a systematic and comprehensive reasoning process allows models to execute much finer-grained and accurate entailment decisions.<n>We define a 3-step reasoning process, consisting of (i) claim decomposition, (ii) sub-claim attribution and entailment classification, and (iii) aggregated classification, showing that such guided reasoning indeed yields improved hallucination detection.
arXiv Detail & Related papers (2025-06-05T17:02:52Z)
Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection [25.176984317213858]
Large Language Models (LLMs) suffer from hallucination problems, which hinder their reliability in sensitive applications. We propose a budget-friendly, two-stage detection algorithm that calls the verifier model only for a subset of cases.
arXiv Detail & Related papers (2025-02-20T21:06:08Z)
HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses [0.12499537119440242]
This paper proposes an explanation enhanced hallucination-detection model, coined as HuDEx. The proposed model provides a novel approach to integrate detection with explanations, and enable both users and the LLM itself to understand and reduce errors.
arXiv Detail & Related papers (2025-02-12T04:17:02Z)
Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models [20.175106988135454]
We introduce a novel Attention-Guided SElf-Reflection (AGSER) approach for zero-shot hallucination detection in Large Language Models (LLMs) The AGSER method utilizes attention contributions to categorize the input query into attentive and non-attentive queries. In addition to its efficacy in detecting hallucinations, AGSER notably reduces computational overhead, requiring only three passes through the LLM and utilizing two sets of tokens.
arXiv Detail & Related papers (2025-01-17T07:30:01Z)
Localizing Factual Inconsistencies in Attributable Text Generation [91.981439746404]
We introduce QASemConsistency, a new formalism for localizing factual inconsistencies in attributable text generation. We first demonstrate the effectiveness of the QASemConsistency methodology for human annotation. We then implement several methods for automatically detecting localized factual inconsistencies.
arXiv Detail & Related papers (2024-10-09T22:53:48Z)
Evaluating the Reliability of Self-Explanations in Large Language Models [2.8894038270224867]
We evaluate two kinds of such self-explanations - extractive and counterfactual. Our findings reveal, that, while these self-explanations can correlate with human judgement, they do not fully and accurately follow the model's decision process. We show that this gap can be bridged because prompting LLMs for counterfactual explanations can produce faithful, informative, and easy-to-verify results.
arXiv Detail & Related papers (2024-07-19T17:41:08Z)
Multiple Instance Verification [11.027466339522777]
We show that naive adaptations of attention-based multiple instance learning methods and standard verification methods are unsuitable for this setting. Under the CAP framework, we propose two novel attention functions to address the challenge of distinguishing between highly similar instances in a target bag.
arXiv Detail & Related papers (2024-07-09T04:51:22Z)
KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking [55.2155025063668]
KnowHalu is a novel approach for detecting hallucinations in text generated by large language models (LLMs) It uses step-wise reasoning, multi-formulation query, multi-form knowledge for factual checking, and fusion-based detection mechanism. Our evaluations demonstrate that KnowHalu significantly outperforms SOTA baselines in detecting hallucinations across diverse tasks.
arXiv Detail & Related papers (2024-04-03T02:52:07Z)
Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection [90.71323430635593]
We propose a novel self-detection paradigm that considers the comprehensive answer space beyond LLM-generated answers. Building upon this paradigm, we introduce a two-step framework, which firstly instructs LLM to reflect and provide justifications for each candidate answer. This framework can be seamlessly integrated with existing approaches for superior self-detection.
arXiv Detail & Related papers (2024-03-15T02:38:26Z)
A Novel Energy based Model Mechanism for Multi-modal Aspect-Based Sentiment Analysis [85.77557381023617]
We propose a novel framework called DQPSA for multi-modal sentiment analysis. PDQ module uses the prompt as both a visual query and a language query to extract prompt-aware visual information. EPE module models the boundaries pairing of the analysis target from the perspective of an Energy-based Model.
arXiv Detail & Related papers (2023-12-13T12:00:46Z)
Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus [99.33091772494751]
Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields. LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations. We propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs.
arXiv Detail & Related papers (2023-11-22T08:39:17Z)
A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection [63.56136319976554]
Large Language Models (LLMs) generate hallucinations, which can cause significant damage when deployed for mission-critical tasks. We propose a self-check approach based on reverse validation to detect factual errors automatically in a zero-resource fashion. We empirically evaluate our method and existing zero-resource detection methods on two datasets.
arXiv Detail & Related papers (2023-10-10T10:14:59Z)
A Verification Framework for Component-Based Modeling and Simulation Putting the pieces together [0.0]
The proposed verification framework provides methods, techniques and tool support for verifying composability at its different levels. In particular we focus on the Dynamic-Semantic Composability level due to its significance in the overall composability correctness and also due to the level of difficulty it poses in the process.
arXiv Detail & Related papers (2023-01-08T18:53:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.