SAC3: Reliable Hallucination Detection in Black-Box Language Models via
Semantic-aware Cross-check Consistency
- URL: http://arxiv.org/abs/2311.01740v2
- Date: Sun, 18 Feb 2024 06:13:47 GMT
- Title: SAC3: Reliable Hallucination Detection in Black-Box Language Models via
Semantic-aware Cross-check Consistency
- Authors: Jiaxin Zhang, Zhuohang Li, Kamalika Das, Bradley A. Malin, Sricharan
Kumar
- Abstract summary: Hallucination detection is a critical step toward understanding the trustworthiness of modern language models (LMs)
We re-examine existing detection approaches based on the self-consistency of LMs and uncover two types of hallucinations resulting from 1) question-level and 2) model-level.
We propose a novel sampling-based method, i.e., semantic-aware cross-check consistency (SAC3) that expands on the principle of self-consistency checking.
- Score: 11.056236593022978
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hallucination detection is a critical step toward understanding the
trustworthiness of modern language models (LMs). To achieve this goal, we
re-examine existing detection approaches based on the self-consistency of LMs
and uncover two types of hallucinations resulting from 1) question-level and 2)
model-level, which cannot be effectively identified through self-consistency
check alone. Building upon this discovery, we propose a novel sampling-based
method, i.e., semantic-aware cross-check consistency (SAC3) that expands on the
principle of self-consistency checking. Our SAC3 approach incorporates
additional mechanisms to detect both question-level and model-level
hallucinations by leveraging advances including semantically equivalent
question perturbation and cross-model response consistency checking. Through
extensive and systematic empirical analysis, we demonstrate that SAC3
outperforms the state of the art in detecting both non-factual and factual
statements across multiple question-answering and open-domain generation
benchmarks.
Related papers
- Evaluating the Reliability of Self-Explanations in Large Language Models [2.8894038270224867]
We evaluate two kinds of such self-explanations - extractive and counterfactual.
Our findings reveal, that, while these self-explanations can correlate with human judgement, they do not fully and accurately follow the model's decision process.
We show that this gap can be bridged because prompting LLMs for counterfactual explanations can produce faithful, informative, and easy-to-verify results.
arXiv Detail & Related papers (2024-07-19T17:41:08Z) - Multiple Instance Verification [11.027466339522777]
We show that naive adaptations of attention-based multiple instance learning methods and standard verification methods are unsuitable for this setting.
Under the CAP framework, we propose two novel attention functions to address the challenge of distinguishing between highly similar instances in a target bag.
arXiv Detail & Related papers (2024-07-09T04:51:22Z) - KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking [55.2155025063668]
KnowHalu is a novel approach for detecting hallucinations in text generated by large language models (LLMs)
It uses step-wise reasoning, multi-formulation query, multi-form knowledge for factual checking, and fusion-based detection mechanism.
Our evaluations demonstrate that KnowHalu significantly outperforms SOTA baselines in detecting hallucinations across diverse tasks.
arXiv Detail & Related papers (2024-04-03T02:52:07Z) - Chain of Thought Explanation for Dialogue State Tracking [52.015771676340016]
Dialogue state tracking (DST) aims to record user queries and goals during a conversational interaction.
We propose a model named Chain-of-Thought-Explanation (CoTE) for the DST task.
CoTE is designed to create detailed explanations step by step after determining the slot values.
arXiv Detail & Related papers (2024-03-07T16:59:55Z) - A Novel Energy based Model Mechanism for Multi-modal Aspect-Based
Sentiment Analysis [85.77557381023617]
We propose a novel framework called DQPSA for multi-modal sentiment analysis.
PDQ module uses the prompt as both a visual query and a language query to extract prompt-aware visual information.
EPE module models the boundaries pairing of the analysis target from the perspective of an Energy-based Model.
arXiv Detail & Related papers (2023-12-13T12:00:46Z) - Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus [99.33091772494751]
Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields.
LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations.
We propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs.
arXiv Detail & Related papers (2023-11-22T08:39:17Z) - A New Benchmark and Reverse Validation Method for Passage-level
Hallucination Detection [63.56136319976554]
Large Language Models (LLMs) generate hallucinations, which can cause significant damage when deployed for mission-critical tasks.
We propose a self-check approach based on reverse validation to detect factual errors automatically in a zero-resource fashion.
We empirically evaluate our method and existing zero-resource detection methods on two datasets.
arXiv Detail & Related papers (2023-10-10T10:14:59Z) - SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for
Generative Large Language Models [55.60306377044225]
"SelfCheckGPT" is a simple sampling-based approach to fact-check the responses of black-box models.
We investigate this approach by using GPT-3 to generate passages about individuals from the WikiBio dataset.
arXiv Detail & Related papers (2023-03-15T19:31:21Z) - A Verification Framework for Component-Based Modeling and Simulation
Putting the pieces together [0.0]
The proposed verification framework provides methods, techniques and tool support for verifying composability at its different levels.
In particular we focus on the Dynamic-Semantic Composability level due to its significance in the overall composability correctness and also due to the level of difficulty it poses in the process.
arXiv Detail & Related papers (2023-01-08T18:53:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.