Related papers: Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection

Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection

URL: http://arxiv.org/abs/2502.15845v1
Date: Thu, 20 Feb 2025 21:06:08 GMT
Title: Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection
Authors: Yihao Xue, Kristjan Greenewald, Youssef Mroueh, Baharan Mirzasoleiman,
Abstract summary: Large Language Models (LLMs) suffer from hallucination problems, which hinder their reliability in sensitive applications.<n>We propose a budget-friendly, two-stage detection algorithm that calls the verifier model only for a subset of cases.
Score: 25.176984317213858
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) suffer from hallucination problems, which hinder their reliability in sensitive applications. In the black-box setting, several self-consistency-based techniques have been proposed for hallucination detection. We empirically study these techniques and show that they achieve performance close to that of a supervised (still black-box) oracle, suggesting little room for improvement within this paradigm. To address this limitation, we explore cross-model consistency checking between the target model and an additional verifier LLM. With this extra information, we observe improved oracle performance compared to purely self-consistency-based methods. We then propose a budget-friendly, two-stage detection algorithm that calls the verifier model only for a subset of cases. It dynamically switches between self-consistency and cross-consistency based on an uncertainty interval of the self-consistency classifier. We provide a geometric interpretation of consistency-based hallucination detection methods through the lens of kernel mean embeddings, offering deeper theoretical insights. Extensive experiments show that this approach maintains high detection performance while significantly reducing computational cost.

Related papers

SINdex: Semantic INconsistency Index for Hallucination Detection in LLMs [2.805517909463769]
Large language models (LLMs) are increasingly deployed across diverse domains, yet they are prone to generating factually incorrect outputs. We introduce a novel and scalable uncertainty-based semantic clustering framework for automated hallucination detection.
arXiv Detail & Related papers (2025-03-07T23:25:19Z)
Friend or Foe? Harnessing Controllable Overfitting for Anomaly Detection [30.77558600436759]
Overfitting has long been stigmatized as detrimental to model performance. We recast overfitting as a controllable and strategic mechanism for enhancing model discrimination capabilities. We present Controllable Overfitting-based Anomaly Detection (COAD), a novel framework designed to leverage overfitting for optimized anomaly detection.
arXiv Detail & Related papers (2024-11-30T19:07:16Z)
Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance. Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z)
Long-Sequence Recommendation Models Need Decoupled Embeddings [49.410906935283585]
We identify and characterize a neglected deficiency in existing long-sequence recommendation models. A single set of embeddings struggles with learning both attention and representation, leading to interference between these two processes. We propose the Decoupled Attention and Representation Embeddings (DARE) model, where two distinct embedding tables are learned separately to fully decouple attention and representation.
arXiv Detail & Related papers (2024-10-03T15:45:15Z)
ConsistencyDet: A Robust Object Detector with a Denoising Paradigm of Consistency Model [28.193325656555803]
We introduce a novel framework designed to articulate object detection as a denoising diffusion process. This framework, termed ConsistencyDet, leverages an innovative denoising concept known as the Consistency Model. We show that ConsistencyDet surpasses other leading-edge detectors in performance metrics.
arXiv Detail & Related papers (2024-04-11T14:08:45Z)
Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks. This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs. We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z)
SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency [11.056236593022978]
Hallucination detection is a critical step toward understanding the trustworthiness of modern language models (LMs) We re-examine existing detection approaches based on the self-consistency of LMs and uncover two types of hallucinations resulting from 1) question-level and 2) model-level. We propose a novel sampling-based method, i.e., semantic-aware cross-check consistency (SAC3) that expands on the principle of self-consistency checking.
arXiv Detail & Related papers (2023-11-03T06:32:43Z)
FactCHD: Benchmarking Fact-Conflicting Hallucination Detection [64.4610684475899]
FactCHD is a benchmark designed for the detection of fact-conflicting hallucinations from LLMs. FactCHD features a diverse dataset that spans various factuality patterns, including vanilla, multi-hop, comparison, and set operation. We introduce Truth-Triangulator that synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2.
arXiv Detail & Related papers (2023-10-18T16:27:49Z)
A Discrepancy Aware Framework for Robust Anomaly Detection [51.710249807397695]
We present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies. Our method leverages an appearance-agnostic cue to guide the decoder in identifying defects, thereby alleviating its reliance on synthetic appearance. Under the simple synthesis strategies, it outperforms existing methods by a large margin. Furthermore, it also achieves the state-of-the-art localization performance.
arXiv Detail & Related papers (2023-10-11T15:21:40Z)
A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection [63.56136319976554]
Large Language Models (LLMs) generate hallucinations, which can cause significant damage when deployed for mission-critical tasks. We propose a self-check approach based on reverse validation to detect factual errors automatically in a zero-resource fashion. We empirically evaluate our method and existing zero-resource detection methods on two datasets.
arXiv Detail & Related papers (2023-10-10T10:14:59Z)
Don't Miss Out on Novelty: Importance of Novel Features for Deep Anomaly Detection [64.21963650519312]
Anomaly Detection (AD) is a critical task that involves identifying observations that do not conform to a learned model of normality. We propose a novel approach to AD using explainability to capture such novel features as unexplained observations in the input space. Our approach establishes a new state-of-the-art across multiple benchmarks, handling diverse anomaly types.
arXiv Detail & Related papers (2023-10-01T21:24:05Z)
Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection [37.99031842449251]
Video anomaly detection under weak supervision presents significant challenges. We present a weakly supervised anomaly detection framework that focuses on efficient context modeling and enhanced semantic discriminability. Our approach significantly improves the detection accuracy of certain anomaly sub-classes, underscoring its practical value and efficacy.
arXiv Detail & Related papers (2023-06-26T06:45:16Z)
Rethinking Clustering-Based Pseudo-Labeling for Unsupervised Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling. This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data. We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.