Related papers: Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models

Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models

URL: http://arxiv.org/abs/2510.12137v1
Date: Tue, 14 Oct 2025 04:31:49 GMT
Title: Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models
Authors: Shihao Ji, Zihui Song, Jiajie Huang,
Abstract summary: Large Language Models (LLMs) hallucinate, generating factually incorrect yet confident assertions.<n>We introduce the Credal Transformer, which replaces standard attention with a Credal Attention Mechanism (CAM) based on evidential theory.
Score: 9.660348625678001
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) hallucinate, generating factually incorrect yet confident assertions. We argue this stems from the Transformer's Softmax function, which creates "Artificial Certainty" by collapsing ambiguous attention scores into a single probability distribution, discarding uncertainty information at each layer. To fix this, we introduce the Credal Transformer, which replaces standard attention with a Credal Attention Mechanism (CAM) based on evidential theory. CAM produces a "credal set" (a set of distributions) instead of a single attention vector, with the set's size directly measuring model uncertainty. We implement this by re-conceptualizing attention scores as evidence masses for a Dirichlet distribution: sufficient evidence recovers standard attention, while insufficient evidence yields a diffuse distribution, representing ambiguity. Empirically, the Credal Transformer identifies out-of-distribution inputs, quantifies ambiguity, and significantly reduces confident errors on unanswerable questions by abstaining. Our contribution is a new architecture to mitigate hallucinations and a design paradigm that integrates uncertainty quantification directly into the model, providing a foundation for more reliable AI.

Related papers

Diffusion-Inspired Reconfiguration of Transformers for Uncertainty Calibration [52.017716672255524]
Uncertainty calibration in pre-trained transformers is critical for their reliable deployment in risk-sensitive applications.<n>We propose a diffusion-inspired reconfiguration of transformers in which each feature transformation block is modeled as a probabilistic mapping.<n>Our method achieves superior calibration and predictive accuracy compared to existing uncertainty-aware transformers.
arXiv Detail & Related papers (2026-02-09T17:24:47Z)
Proximity-Based Evidence Retrieval for Uncertainty-Aware Neural Networks [6.9681910774977815]
This work proposes an evidence-retrieval mechanism for uncertainty-aware decision-making.<n>For each test instance, exemplars are retrieved in an embedding space; their predictive distributions are fused via Dempster-Shafer theory.<n>Because the supporting evidences are explicit, decisions are transparent and auditable.
arXiv Detail & Related papers (2025-09-11T13:12:22Z)
Network Inversion for Generating Confidently Classified Counterfeits [11.599035626374409]
In vision classification, generating inputs that elicit confident predictions is key to understanding model behavior and reliability.<n>We extend network inversion techniques to generate Confidently Classified Counterfeits (CCCs)<n>CCCs offer a model-centric perspective on confidence, revealing that models can assign high confidence to entirely synthetic, out-of-distribution inputs.
arXiv Detail & Related papers (2025-03-26T03:26:49Z)
Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal Approach [51.012396632595554]
Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments. Recent theoretical results verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail in unseen domains. We develop an approach based on conditional mutual information with respect to RS-SCM, then rigorously rectify the spurious and fake invariant effects.
arXiv Detail & Related papers (2023-12-15T12:58:05Z)
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z)
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings [68.61185138897312]
We show that a frozen transformer language model encodes strong positional information through the shrinkage of self-attention variance. Our findings serve to justify the decision to discard positional embeddings and thus facilitate more efficient pretraining of transformer language models.
arXiv Detail & Related papers (2023-05-23T01:03:40Z)
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models [64.87562101662952]
We show that input tokens are often exchangeable since they already include positional encodings. We establish the existence of a sufficient and minimal representation of input tokens. We prove that attention with the desired parameter infers the latent posterior up to an approximation error.
arXiv Detail & Related papers (2022-12-30T17:59:01Z)
Attention that does not Explain Away [54.42960937271612]
Models based on the Transformer architecture have achieved better accuracy than the ones based on competing architectures for a large set of tasks. A unique feature of the Transformer is its universal application of a self-attention mechanism, which allows for free information flow at arbitrary distances. We propose a doubly-normalized attention scheme that is simple to implement and provides theoretical guarantees for avoiding the "explaining away" effect.
arXiv Detail & Related papers (2020-09-29T21:05:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.