SAC3: Reliable Hallucination Detection in Black-Box Language Models via
Semantic-aware Cross-check Consistency
- URL: http://arxiv.org/abs/2311.01740v2
- Date: Sun, 18 Feb 2024 06:13:47 GMT
- Title: SAC3: Reliable Hallucination Detection in Black-Box Language Models via
Semantic-aware Cross-check Consistency
- Authors: Jiaxin Zhang, Zhuohang Li, Kamalika Das, Bradley A. Malin, Sricharan
Kumar
- Abstract summary: Hallucination detection is a critical step toward understanding the trustworthiness of modern language models (LMs)
We re-examine existing detection approaches based on the self-consistency of LMs and uncover two types of hallucinations resulting from 1) question-level and 2) model-level.
We propose a novel sampling-based method, i.e., semantic-aware cross-check consistency (SAC3) that expands on the principle of self-consistency checking.
- Score: 11.056236593022978
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hallucination detection is a critical step toward understanding the
trustworthiness of modern language models (LMs). To achieve this goal, we
re-examine existing detection approaches based on the self-consistency of LMs
and uncover two types of hallucinations resulting from 1) question-level and 2)
model-level, which cannot be effectively identified through self-consistency
check alone. Building upon this discovery, we propose a novel sampling-based
method, i.e., semantic-aware cross-check consistency (SAC3) that expands on the
principle of self-consistency checking. Our SAC3 approach incorporates
additional mechanisms to detect both question-level and model-level
hallucinations by leveraging advances including semantically equivalent
question perturbation and cross-model response consistency checking. Through
extensive and systematic empirical analysis, we demonstrate that SAC3
outperforms the state of the art in detecting both non-factual and factual
statements across multiple question-answering and open-domain generation
benchmarks.
Related papers
- Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection [25.176984317213858]
Large Language Models (LLMs) suffer from hallucination problems, which hinder their reliability in sensitive applications.
We propose a budget-friendly, two-stage detection algorithm that calls the verifier model only for a subset of cases.
arXiv Detail & Related papers (2025-02-20T21:06:08Z) - HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses [0.12499537119440242]
This paper proposes an explanation enhanced hallucination-detection model, coined as HuDEx.
The proposed model provides a novel approach to integrate detection with explanations, and enable both users and the LLM itself to understand and reduce errors.
arXiv Detail & Related papers (2025-02-12T04:17:02Z) - Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models [20.175106988135454]
We introduce a novel Attention-Guided SElf-Reflection (AGSER) approach for zero-shot hallucination detection in Large Language Models (LLMs)
The AGSER method utilizes attention contributions to categorize the input query into attentive and non-attentive queries.
In addition to its efficacy in detecting hallucinations, AGSER notably reduces computational overhead, requiring only three passes through the LLM and utilizing two sets of tokens.
arXiv Detail & Related papers (2025-01-17T07:30:01Z) - Localizing Factual Inconsistencies in Attributable Text Generation [91.981439746404]
We introduce QASemConsistency, a new formalism for localizing factual inconsistencies in attributable text generation.
We first demonstrate the effectiveness of the QASemConsistency methodology for human annotation.
We then implement several methods for automatically detecting localized factual inconsistencies.
arXiv Detail & Related papers (2024-10-09T22:53:48Z) - Evaluating the Reliability of Self-Explanations in Large Language Models [2.8894038270224867]
We evaluate two kinds of such self-explanations - extractive and counterfactual.
Our findings reveal, that, while these self-explanations can correlate with human judgement, they do not fully and accurately follow the model's decision process.
We show that this gap can be bridged because prompting LLMs for counterfactual explanations can produce faithful, informative, and easy-to-verify results.
arXiv Detail & Related papers (2024-07-19T17:41:08Z) - Multiple Instance Verification [11.027466339522777]
We show that naive adaptations of attention-based multiple instance learning methods and standard verification methods are unsuitable for this setting.
Under the CAP framework, we propose two novel attention functions to address the challenge of distinguishing between highly similar instances in a target bag.
arXiv Detail & Related papers (2024-07-09T04:51:22Z) - KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking [55.2155025063668]
KnowHalu is a novel approach for detecting hallucinations in text generated by large language models (LLMs)
It uses step-wise reasoning, multi-formulation query, multi-form knowledge for factual checking, and fusion-based detection mechanism.
Our evaluations demonstrate that KnowHalu significantly outperforms SOTA baselines in detecting hallucinations across diverse tasks.
arXiv Detail & Related papers (2024-04-03T02:52:07Z) - Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection [90.71323430635593]
We propose a novel self-detection paradigm that considers the comprehensive answer space beyond LLM-generated answers.
Building upon this paradigm, we introduce a two-step framework, which firstly instructs LLM to reflect and provide justifications for each candidate answer.
This framework can be seamlessly integrated with existing approaches for superior self-detection.
arXiv Detail & Related papers (2024-03-15T02:38:26Z) - A Novel Energy based Model Mechanism for Multi-modal Aspect-Based
Sentiment Analysis [85.77557381023617]
We propose a novel framework called DQPSA for multi-modal sentiment analysis.
PDQ module uses the prompt as both a visual query and a language query to extract prompt-aware visual information.
EPE module models the boundaries pairing of the analysis target from the perspective of an Energy-based Model.
arXiv Detail & Related papers (2023-12-13T12:00:46Z) - Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus [99.33091772494751]
Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields.
LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations.
We propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs.
arXiv Detail & Related papers (2023-11-22T08:39:17Z) - A New Benchmark and Reverse Validation Method for Passage-level
Hallucination Detection [63.56136319976554]
Large Language Models (LLMs) generate hallucinations, which can cause significant damage when deployed for mission-critical tasks.
We propose a self-check approach based on reverse validation to detect factual errors automatically in a zero-resource fashion.
We empirically evaluate our method and existing zero-resource detection methods on two datasets.
arXiv Detail & Related papers (2023-10-10T10:14:59Z) - A Verification Framework for Component-Based Modeling and Simulation
Putting the pieces together [0.0]
The proposed verification framework provides methods, techniques and tool support for verifying composability at its different levels.
In particular we focus on the Dynamic-Semantic Composability level due to its significance in the overall composability correctness and also due to the level of difficulty it poses in the process.
arXiv Detail & Related papers (2023-01-08T18:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.