Do I Really Know? Learning Factual Self-Verification for Hallucination Reduction
- URL: http://arxiv.org/abs/2602.02018v1
- Date: Mon, 02 Feb 2026 12:15:50 GMT
- Title: Do I Really Know? Learning Factual Self-Verification for Hallucination Reduction
- Authors: Enes Altinisik, Masoomali Fatehkia, Fatih Deniz, Nadir Durrani, Majd Hawasly, Mohammad Raza, Husrev Taha Sencar,
- Abstract summary: We propose a training-time framework that teaches large language models to reason about factual uncertainty through consistency-based self-verification.<n>Across multiple model families and scales, VeriFY reduces factual hallucination rates by 9.7 to 53.3 percent, with only modest reductions in recall.<n>The source code, training data, and trained model checkpoints will be released upon acceptance.
- Score: 14.310806623700037
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Factual hallucination remains a central challenge for large language models (LLMs). Existing mitigation approaches primarily rely on either external post-hoc verification or mapping uncertainty directly to abstention during fine-tuning, often resulting in overly conservative behavior. We propose VeriFY, a training-time framework that teaches LLMs to reason about factual uncertainty through consistency-based self-verification. VeriFY augments training with structured verification traces that guide the model to produce an initial answer, generate and answer a probing verification query, issue a consistency judgment, and then decide whether to answer or abstain. To address the risk of reinforcing hallucinated content when training on augmented traces, we introduce a stage-level loss masking approach that excludes hallucinated answer stages from the training objective while preserving supervision over verification behavior. Across multiple model families and scales, VeriFY reduces factual hallucination rates by 9.7 to 53.3 percent, with only modest reductions in recall (0.4 to 5.7 percent), and generalizes across datasets when trained on a single source. The source code, training data, and trained model checkpoints will be released upon acceptance.
Related papers
- Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models [59.6715047267181]
Small reasoning models (SRMs) are prone to hallucinations, especially in intermediate reasoning steps.<n>Existing mitigation methods based on online reinforcement learning rely on outcome-based rewards or coarse-grained chain-of-thought evaluation.<n>We propose Faithfulness-Aware Step-Level Reinforcement Learning (FaithRL), introducing step-level supervision via explicit faithfulness rewards from a process reward model.
arXiv Detail & Related papers (2026-02-05T17:15:12Z) - Training Introspective Behavior: Fine-Tuning Induces Reliable Internal State Detection in a 7B Model [0.0]
Lindsey (2025) investigates introspective awareness in language models through four experiments.<n>We focus on the first of these experiments -- self-report of injected "thoughts"<n>We show that at least one component of introspective behavior can be directly induced, offering a pathway to built-in AI transparency.
arXiv Detail & Related papers (2025-11-26T13:49:43Z) - Train for Truth, Keep the Skills: Binary Retrieval-Augmented Reward Mitigates Hallucinations [103.16279860448874]
We propose an online reinforcement learning method using a novel binary retrieval-augmented reward (RAR)<n>For open-ended generation, binary RAR achieves a 39.3% reduction in hallucination rates.<n>In short-form question answering, the model learns abstention, strategically outputting "I don't know" when faced with insufficient parametric knowledge.
arXiv Detail & Related papers (2025-10-20T16:45:43Z) - Mitigating Hallucinations in Large Vision-Language Models by Self-Injecting Hallucinations [73.37711261605271]
hallucination mitigation methods are mainly based on preference alignment and require external human annotations or auxiliary models for preference data collection.<n>We propose Autonomous Preference Alignment via Self-Injection (APASI), a novel and generalizable method that mitigates hallucinations without external dependencies.<n>APASI leverages the target LVLM to self-inject hallucinations into a generated response, creating a pair of responses with varying preference levels.
arXiv Detail & Related papers (2025-09-14T14:26:53Z) - Unsupervised Hallucination Detection by Inspecting Reasoning Processes [53.15199932086543]
Unsupervised hallucination detection aims to identify hallucinated content generated by large language models (LLMs) without relying on labeled data.<n>We propose IRIS, an unsupervised hallucination detection framework, leveraging internal representations intrinsic to factual correctness.<n>Our approach is fully unsupervised, computationally low cost, and works well even with few training data, making it suitable for real-time detection.
arXiv Detail & Related papers (2025-09-12T06:58:17Z) - Analyzing and Mitigating Object Hallucination: A Training Bias Perspective [108.09666587800781]
We introduce a new benchmark, POPEv2, which consists of counterfactual images collected from the training data of LVLMs with certain objects masked.<n>We find that current LVLMs suffer from training bias: they fail to fully leverage their training data and hallucinate more frequently on images seen during training.<n>We propose Obliviate, an efficient and lightweight unlearning method designed to mitigate object hallucination via training bias unlearning.
arXiv Detail & Related papers (2025-08-06T15:51:02Z) - Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation [5.9079338934481225]
We propose mitigating hallucination through knowledge distillation (KD)<n>KD provides smoothed soft labels to a student model, reducing overconfidence and improving factual grounding.<n> Experimental results on summarization benchmarks demonstrate that KD reduces hallucination compared to standard finetuning.
arXiv Detail & Related papers (2025-02-16T23:05:36Z) - On-Policy Self-Alignment with Fine-grained Knowledge Feedback for Hallucination Mitigation [47.35777964373532]
Hallucination occurs when large language models exhibit behavior that deviates from the boundaries of their knowledge during response generation.<n>Previous learning-based methods attempt to finetune models but are limited by off-policy sampling and coarse-grained feedback.<n>We present RLFH, an on-policy self-alignment approach that enables LLMs to actively explore their knowledge boundaries and self-correct generation behavior.
arXiv Detail & Related papers (2024-06-18T02:43:49Z) - A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of
LLMs by Validating Low-Confidence Generation [76.34411067299331]
Large language models often tend to 'hallucinate' which critically hampers their reliability.
We propose an approach that actively detects and mitigates hallucinations during the generation process.
We show that the proposed active detection and mitigation approach successfully reduces the hallucinations of the GPT-3.5 model from 47.5% to 14.5% on average.
arXiv Detail & Related papers (2023-07-08T14:25:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.