How to Steer LLM Latents for Hallucination Detection?
- URL: http://arxiv.org/abs/2503.01917v1
- Date: Sat, 01 Mar 2025 19:19:34 GMT
- Title: How to Steer LLM Latents for Hallucination Detection?
- Authors: Seongheon Park, Xuefeng Du, Min-Hsuan Yeh, Haobo Wang, Yixuan Li,
- Abstract summary: We propose a steering vector that reshapes the representation space during inference to separate truthful and hallucinated outputs.<n>Our two-stage framework first trains TSV on a small set of labeled exemplars to form compact and well-separated clusters.<n>It then augments the exemplar set with unlabeled LLM generations, employing an optimal transport-based algorithm for pseudo-labeling combined with a confidence-based filtering process.
- Score: 29.967245405976072
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hallucinations in LLMs pose a significant concern to their safe deployment in real-world applications. Recent approaches have leveraged the latent space of LLMs for hallucination detection, but their embeddings, optimized for linguistic coherence rather than factual accuracy, often fail to clearly separate truthful and hallucinated content. To this end, we propose the Truthfulness Separator Vector (TSV), a lightweight and flexible steering vector that reshapes the LLM's representation space during inference to enhance the separation between truthful and hallucinated outputs, without altering model parameters. Our two-stage framework first trains TSV on a small set of labeled exemplars to form compact and well-separated clusters. It then augments the exemplar set with unlabeled LLM generations, employing an optimal transport-based algorithm for pseudo-labeling combined with a confidence-based filtering process. Extensive experiments demonstrate that TSV achieves state-of-the-art performance with minimal labeled data, exhibiting strong generalization across datasets and providing a practical solution for real-world LLM applications.
Related papers
- CutPaste&Find: Efficient Multimodal Hallucination Detector with Visual-aid Knowledge Base [29.477973983931083]
We propose CutPaste&Find, a lightweight and training-free framework for detecting hallucinations in LVLM-generated outputs.<n>At the core of our framework is a Visual-aid Knowledge Base that encodes rich entity-attribute relationships and associated image representations.<n>We introduce a scaling factor to refine similarity scores, mitigating the issue of suboptimal alignment values even for ground-truth image-text pairs.
arXiv Detail & Related papers (2025-02-18T07:06:36Z) - LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression.<n>LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model.<n>Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z) - Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering [66.5524727179286]
NOVA is a framework designed to identify high-quality data that aligns well with the learned knowledge to reduce hallucinations.<n>It includes Internal Consistency Probing (ICP) and Semantic Equivalence Identification (SEI) to measure how familiar the LLM is with instruction data.<n>To ensure the quality of selected samples, we introduce an expert-aligned reward model, considering characteristics beyond just familiarity.
arXiv Detail & Related papers (2025-02-11T08:05:56Z) - Mitigating Hallucinations in Large Vision-Language Models with Internal Fact-based Contrastive Decoding [5.424048651554831]
Internal Fact-based Contrastive Decoding (IFCD) is designed to mitigate and suppress hallucinations during the inference process of Large Visual Language Models (LVLMs)
IFCD calibrates the LVLMs' output and effectively removes the hallucinatory logits from the final predictions.
Experimental results validate that IFCD significantly alleviates both object-level and attribute-level hallucinations while achieving an average 9% accuracy improvement on POPE and 8% accuracy improvement on MME object hallucinations subset compared with direct decoding, respectively.
arXiv Detail & Related papers (2025-02-03T05:08:35Z) - Provenance: A Light-weight Fact-checker for Retrieval Augmented LLM Generation Output [49.893971654861424]
We present a light-weight approach for detecting nonfactual outputs from retrieval-augmented generation (RAG)
We compute a factuality score that can be thresholded to yield a binary decision.
Our experiments show high area under the ROC curve (AUC) across a wide range of relevant open source datasets.
arXiv Detail & Related papers (2024-11-01T20:44:59Z) - SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration [10.970637831760136]
Speculative decoding (SD) has emerged as a widely used paradigm to accelerate the inference of large language models (LLMs)
We introduce SWIFT, an on-the-fly self-speculative decoding algorithm that adaptively selects intermediate layers of LLMs to skip during inference.
We show that SWIFT can achieve over a 1.3x-1.6x speedup while preserving the original distribution of the generated text.
arXiv Detail & Related papers (2024-10-09T14:15:30Z) - DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph [70.79413606968814]
We introduce Dynamic Evaluation of LLMs via Adaptive Reasoning Graph Evolvement (DARG) to dynamically extend current benchmarks with controlled complexity and diversity.
Specifically, we first extract the reasoning graphs of data points in current benchmarks and then perturb the reasoning graphs to generate novel testing data.
Such newly generated test samples can have different levels of complexity while maintaining linguistic diversity similar to the original benchmarks.
arXiv Detail & Related papers (2024-06-25T04:27:53Z) - One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)
We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z) - HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding [30.30494071474536]
HALC is a novel decoding algorithm designed to mitigate object hallucinations (OH) in large vision-language models (LVLMs)
HALC integrates a robust auto-focal grounding mechanism (locally) to correct hallucinated tokens on the fly, and a specialized beam search algorithm (globally) to significantly reduce OH.
arXiv Detail & Related papers (2024-03-01T10:21:52Z) - OPDAI at SemEval-2024 Task 6: Small LLMs can Accelerate Hallucination
Detection with Weakly Supervised Data [1.3981625092173873]
This paper describes a unified system for hallucination detection of LLMs.
It wins the second prize in the model-agnostic track of the SemEval-2024 Task 6.
arXiv Detail & Related papers (2024-02-20T11:01:39Z) - Mitigating Object Hallucination in Large Vision-Language Models via
Classifier-Free Guidance [56.04768229686853]
Large Vision-Language Models (LVLMs) tend to hallucinate non-existing objects in the images.
We introduce a framework called Mitigating hallucinAtion via classifieR-Free guIdaNcE (MARINE)
MARINE is both training-free and API-free, and can effectively and efficiently reduce object hallucinations during the generation process.
arXiv Detail & Related papers (2024-02-13T18:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.