Related papers: Listen to the Layers: Mitigating Hallucinations with Inter-Layer Disagreement

Listen to the Layers: Mitigating Hallucinations with Inter-Layer Disagreement

URL: http://arxiv.org/abs/2602.09486v1
Date: Tue, 10 Feb 2026 07:32:37 GMT
Title: Listen to the Layers: Mitigating Hallucinations with Inter-Layer Disagreement
Authors: Koduvayur Subbalakshmi, Sabbir Hossain Ujjal, Venkata Krishna Teja Mangichetty, Nastaran Jamalipour Soofi,
Abstract summary: Pretrained Large Language Models (LLMs) are prone to generating fluent yet factually incorrect text-a phenomenon known as hallucinations.<n>We propose a novel, training-free decoding algorithm that mitigates hallucinations at inference time by listening to these signals in the middle layers.
Score: 0.24443539255794253
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Pretrained Large Language Models (LLMs) are prone to generating fluent yet factually incorrect text-a phenomenon known as hallucinations, undermining their reliability and utility in downstream tasks. We hypothesize that a generated text span's factuality is correlated with its representational instability across the model's internal layers. Based on this, we propose the CoCoA (Confusion and Consistency Aware) decoder, a novel, training-free decoding algorithm that mitigates hallucinations at inference time by listening to these signals in the middle layers. We propose two metrics to quantify this instability in the middle layers, and use it to penalize outputs that exhibit high internal confusion, thereby steering the model towards more internally consistent and factually grounded outputs. We further propose a self-information gated variant, CoCoA-SIG, that dynamically modulates this penalty to selectively target high-surprise, unstable generations. Extensive experiments on diverse tasks, including question-answering, summarization and code generation demonstrate that CoCoA significantly improves factual correctness across multiple model families (e.g., Llama-3, Qwen-2.5, Mistral). By leveraging model-intrinsic signals, CoCoA offers an effective and broadly applicable method for enhancing the trustworthiness of LLMs at inference time, without requiring any model retraining.

Related papers

CoFi-Dec: Hallucination-Resistant Decoding via Coarse-to-Fine Generative Feedback in Large Vision-Language Models [14.570869250170139]
Large Vision-Language Models (LVLMs) have achieved impressive progress in multi-modal understanding and generation.<n>CoFi-Dec is a training-free decoding framework that mitigates hallucinations by integrating generative self-feedback with coarse-to-fine visual conditioning.<n>Experiments show that CoFi-Dec substantially reduces both entity-level and semantic-level hallucinations, outperforming existing decoding strategies.
arXiv Detail & Related papers (2025-12-29T13:23:20Z)
QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation [14.312693191309101]
Dynamic Retrieval-Augmented Generation adaptively determines when to retrieve during generation to hallucinations in large language models.<n>We propose QuCo-RAG, which shifts from subjective confidence to objective statistics computed from pre-training data.<n>Our method quantifies uncertainty through two stages: (1) before generation, we identify low-frequency entities indicating long-tail knowledge gaps; (2) during generation, we verify entity co-occurrence in the pre-training corpus, where zero co-occurrence often signals hallucination risk.
arXiv Detail & Related papers (2025-12-22T08:28:05Z)
SIM-CoT: Supervised Implicit Chain-of-Thought [108.30049193668083]
Implicit Chain-of-Thought (CoT) methods offer a token-efficient alternative to explicit CoT reasoning in Large Language Models.<n>We identify a core latent instability issue when scaling the computational budget of implicit CoT.<n>We propose SIM-CoT, a plug-and-play training module that introduces step-level supervision to stabilize and enrich the latent reasoning space.
arXiv Detail & Related papers (2025-09-24T17:01:32Z)
HAVE: Head-Adaptive Gating and ValuE Calibration for Hallucination Mitigation in Large Language Models [29.677280135028436]
Large Language Models (LLMs) often produce hallucinations in retrieval-augmented or long-context generation.<n> HAVE (Head-Adaptive Gating and ValuE) is a parameter-free decoding framework that addresses head importance and raw attention weights.<n> HAVE consistently reduces hallucinations and outperforms strong baselines, including DAGCD, with modest overhead.
arXiv Detail & Related papers (2025-09-08T12:06:09Z)
Rethinking Layer-wise Model Merging through Chain of Merges [21.26982153528304]
Chain of Merges (CoM) is a layer-wise merging procedure that sequentially merges weights across layers while sequentially updating activation statistics.<n> Experiments on standard benchmarks demonstrate that CoM achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-08-29T08:44:47Z)
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers [53.43862310647276]
Large language models (LLMs) excel at natural language understanding and generation but remain vulnerable to factual errors.<n>We introduce a token-aware, layer-localized contrastive decoding method that aligns specific token types with their most influential transformer layers to improve factual generation.<n>Our method requires no additional training or model modification, and experiments demonstrate that our method consistently improves factuality across multiple LLMs and various benchmarks.
arXiv Detail & Related papers (2025-07-06T14:35:43Z)
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation [68.19756761027351]
Diffusion large language models (dLLMs) are compelling alternatives to autoregressive (AR) models.<n>We investigate their denoising processes and reinforcement learning methods.<n>Our work provides deeper insight into the machinery of dLLM generation and offers an effective, diffusion-native RL training framework.
arXiv Detail & Related papers (2025-06-25T17:35:47Z)
Factual Self-Awareness in Language Models: Representation, Robustness, and Scaling [56.26834106704781]
Factual incorrectness in generated content is one of the primary concerns in ubiquitous deployment of large language models (LLMs)<n>We provide evidence supporting the presence of LLMs' internal compass that dictate the correctness of factual recall at the time of generation.<n>Scaling experiments across model sizes and training dynamics highlight that self-awareness emerges rapidly during training and peaks in intermediate layers.
arXiv Detail & Related papers (2025-05-27T16:24:02Z)
Improving the Reliability of LLMs: Combining CoT, RAG, Self-Consistency, and Self-Verification [1.5095869543963976]
Large language models (LLMs) generate confident but incorrect or irrelevant information.<n>Hallucination is a key limitation in their application to complex, open-ended tasks.<n>We investigate how combining Chain-of-thought (CoT) with retrieval-augmented generation (RAG) can reduce hallucinations.
arXiv Detail & Related papers (2025-05-13T23:57:02Z)
Lower Layers Matter: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused [27.894293943142447]
Large Language Models (LLMs) have demonstrated exceptional performance across various natural language processing tasks.<n>They occasionally generate inaccurate and counterfactual outputs, a phenomenon commonly referred to as "hallucinations"
arXiv Detail & Related papers (2024-08-16T14:23:59Z)
Accelerating LLaMA Inference by Enabling Intermediate Layer Decoding via Instruction Tuning with LITE [62.13435256279566]
Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks. However, their large size makes their inference slow and computationally expensive. We show that it enables these layers to acquire 'good' generation ability without affecting the generation ability of the final layer.
arXiv Detail & Related papers (2023-10-28T04:07:58Z)
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models [79.01926242857613]
Large language models (LLMs) are prone to hallucinations, generating content that deviates from facts seen during pretraining. We propose a simple decoding strategy for reducing hallucinations with pretrained LLMs. We find that this Decoding by Contrasting Layers (DoLa) approach is able to better surface factual knowledge and reduce the generation of incorrect facts.
arXiv Detail & Related papers (2023-09-07T17:45:31Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.