Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models
- URL: http://arxiv.org/abs/2510.12137v1
- Date: Tue, 14 Oct 2025 04:31:49 GMT
- Title: Credal Transformer: A Principled Approach for Quantifying and Mitigating Hallucinations in Large Language Models
- Authors: Shihao Ji, Zihui Song, Jiajie Huang,
- Abstract summary: Large Language Models (LLMs) hallucinate, generating factually incorrect yet confident assertions.<n>We introduce the Credal Transformer, which replaces standard attention with a Credal Attention Mechanism (CAM) based on evidential theory.
- Score: 9.660348625678001
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) hallucinate, generating factually incorrect yet confident assertions. We argue this stems from the Transformer's Softmax function, which creates "Artificial Certainty" by collapsing ambiguous attention scores into a single probability distribution, discarding uncertainty information at each layer. To fix this, we introduce the Credal Transformer, which replaces standard attention with a Credal Attention Mechanism (CAM) based on evidential theory. CAM produces a "credal set" (a set of distributions) instead of a single attention vector, with the set's size directly measuring model uncertainty. We implement this by re-conceptualizing attention scores as evidence masses for a Dirichlet distribution: sufficient evidence recovers standard attention, while insufficient evidence yields a diffuse distribution, representing ambiguity. Empirically, the Credal Transformer identifies out-of-distribution inputs, quantifies ambiguity, and significantly reduces confident errors on unanswerable questions by abstaining. Our contribution is a new architecture to mitigate hallucinations and a design paradigm that integrates uncertainty quantification directly into the model, providing a foundation for more reliable AI.
Related papers
- Diffusion-Inspired Reconfiguration of Transformers for Uncertainty Calibration [52.017716672255524]
Uncertainty calibration in pre-trained transformers is critical for their reliable deployment in risk-sensitive applications.<n>We propose a diffusion-inspired reconfiguration of transformers in which each feature transformation block is modeled as a probabilistic mapping.<n>Our method achieves superior calibration and predictive accuracy compared to existing uncertainty-aware transformers.
arXiv Detail & Related papers (2026-02-09T17:24:47Z) - Proximity-Based Evidence Retrieval for Uncertainty-Aware Neural Networks [6.9681910774977815]
This work proposes an evidence-retrieval mechanism for uncertainty-aware decision-making.<n>For each test instance, exemplars are retrieved in an embedding space; their predictive distributions are fused via Dempster-Shafer theory.<n>Because the supporting evidences are explicit, decisions are transparent and auditable.
arXiv Detail & Related papers (2025-09-11T13:12:22Z) - Network Inversion for Generating Confidently Classified Counterfeits [11.599035626374409]
In vision classification, generating inputs that elicit confident predictions is key to understanding model behavior and reliability.<n>We extend network inversion techniques to generate Confidently Classified Counterfeits (CCCs)<n>CCCs offer a model-centric perspective on confidence, revealing that models can assign high confidence to entirely synthetic, out-of-distribution inputs.
arXiv Detail & Related papers (2025-03-26T03:26:49Z) - Diagnosing and Rectifying Fake OOD Invariance: A Restructured Causal
Approach [51.012396632595554]
Invariant representation learning (IRL) encourages the prediction from invariant causal features to labels de-confounded from the environments.
Recent theoretical results verified that some causal features recovered by IRLs merely pretend domain-invariantly in the training environments but fail in unseen domains.
We develop an approach based on conditional mutual information with respect to RS-SCM, then rigorously rectify the spurious and fake invariant effects.
arXiv Detail & Related papers (2023-12-15T12:58:05Z) - Prototype-based Aleatoric Uncertainty Quantification for Cross-modal
Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space.
However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts.
We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z) - Latent Positional Information is in the Self-Attention Variance of
Transformer Language Models Without Positional Embeddings [68.61185138897312]
We show that a frozen transformer language model encodes strong positional information through the shrinkage of self-attention variance.
Our findings serve to justify the decision to discard positional embeddings and thus facilitate more efficient pretraining of transformer language models.
arXiv Detail & Related papers (2023-05-23T01:03:40Z) - An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models [64.87562101662952]
We show that input tokens are often exchangeable since they already include positional encodings.
We establish the existence of a sufficient and minimal representation of input tokens.
We prove that attention with the desired parameter infers the latent posterior up to an approximation error.
arXiv Detail & Related papers (2022-12-30T17:59:01Z) - Attention that does not Explain Away [54.42960937271612]
Models based on the Transformer architecture have achieved better accuracy than the ones based on competing architectures for a large set of tasks.
A unique feature of the Transformer is its universal application of a self-attention mechanism, which allows for free information flow at arbitrary distances.
We propose a doubly-normalized attention scheme that is simple to implement and provides theoretical guarantees for avoiding the "explaining away" effect.
arXiv Detail & Related papers (2020-09-29T21:05:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.