FaithLens: Detecting and Explaining Faithfulness Hallucination
- URL: http://arxiv.org/abs/2512.20182v1
- Date: Tue, 23 Dec 2025 09:20:32 GMT
- Title: FaithLens: Detecting and Explaining Faithfulness Hallucination
- Authors: Shuzheng Si, Qingyi Wang, Haozhe Zhao, Yuzhuo Bai, Guanqiao Chen, Kangyang Luo, Gang Chen, Fanchao Qi, Minjia Zhang, Baobao Chang, Maosong Sun,
- Abstract summary: We introduce FaithLens, a cost-efficient and effective faithfulness hallucination detection model.<n>We apply a well-defined data filtering strategy to ensure label correctness, explanation quality, and data diversity.<n>FaithLens can produce high-quality explanations, delivering a distinctive balance of trustworthiness, efficiency, and effectiveness.
- Score: 63.905100627300925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recognizing whether outputs from large language models (LLMs) contain faithfulness hallucination is crucial for real-world applications, e.g., retrieval-augmented generation and summarization. In this paper, we introduce FaithLens, a cost-efficient and effective faithfulness hallucination detection model that can jointly provide binary predictions and corresponding explanations to improve trustworthiness. To achieve this, we first synthesize training data with explanations via advanced LLMs and apply a well-defined data filtering strategy to ensure label correctness, explanation quality, and data diversity. Subsequently, we fine-tune the model on these well-curated training data as a cold start and further optimize it with rule-based reinforcement learning, using rewards for both prediction correctness and explanation quality. Results on 12 diverse tasks show that the 8B-parameter FaithLens outperforms advanced models such as GPT-4.1 and o3. Also, FaithLens can produce high-quality explanations, delivering a distinctive balance of trustworthiness, efficiency, and effectiveness.
Related papers
- CVeDRL: An Efficient Code Verifier via Difficulty-aware Reinforcement Learning [57.24524263804788]
Code verifiers play a critical role in post-verification for LLM-based code generation.<n>Existing supervised fine-tuning methods suffer from data scarcity, high failure rates, and poor inference efficiency.<n>We show that naive RL with only functionality rewards fails to generate effective unit tests for difficult branches and samples.
arXiv Detail & Related papers (2026-01-30T10:33:29Z) - Lie to Me: Knowledge Graphs for Robust Hallucination Self-Detection in LLMs [0.0]
We examine the use of structured knowledge representations, namely knowledge graphs, to improve hallucination self-detection.<n>Our results show that LLMs can better analyse atomic facts when they are structured as knowledge graphs.<n>This low-cost, model-agnostic approach paves the way toward safer and more trustworthy language models.
arXiv Detail & Related papers (2025-12-29T15:41:13Z) - Reasoning Models Hallucinate More: Factuality-Aware Reinforcement Learning for Large Reasoning Models [83.24079543652253]
Large language models (LLMs) have significantly advanced in reasoning tasks through reinforcement learning (RL) optimization.<n>However, reasoning-oriented RL fine-tuning significantly increases the prevalence of hallucinations.<n>We propose Factuality-aware Step-wise Policy Optimization (FSPO), an innovative RL fine-tuning algorithm incorporating explicit factuality verification.
arXiv Detail & Related papers (2025-05-30T14:23:32Z) - Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning [80.27561080938747]
CANOE is a framework to reduce hallucinations of faithfulness of large language models across different downstream tasks without human annotations.<n>Dual-GRPO is a rule-based reinforcement learning method that includes three tailored rule-based rewards derived from synthesized short-form QA data.<n> Experimental results show that CANOE greatly improves the faithfulness of LLMs across 11 different tasks, even outperforming the most advanced LLMs.
arXiv Detail & Related papers (2025-05-22T10:10:07Z) - Improving Contextual Faithfulness of Large Language Models via Retrieval Heads-Induced Optimization [35.269343563526675]
We propose RHIO, a framework to teach large language models to explicitly discriminate between faithful and unfaithful generations.<n> RHIO first augments unfaithful samples that simulate realistic model-intrinsic errors by selectively masking retrieval heads.<n>These samples are incorporated into joint training, enabling the model to distinguish unfaithful outputs from faithful ones conditioned on control tokens.
arXiv Detail & Related papers (2025-01-23T11:23:25Z) - Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation [96.78845113346809]
Retrieval-augmented language models (RALMs) have shown strong performance and wide applicability in knowledge-intensive tasks.
This paper proposes SynCheck, a lightweight monitor that leverages fine-grained decoding dynamics to detect unfaithful sentences.
We also introduce FOD, a faithfulness-oriented decoding algorithm guided by beam search for long-form retrieval-augmented generation.
arXiv Detail & Related papers (2024-06-19T16:42:57Z) - More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness [24.843692458375436]
This study investigates how models aligned with general-purpose preference data perform across five trustworthiness verticals.<n>Our results demonstrate that RLHF on human preferences doesn't automatically guarantee trustworthiness, and reverse effects are often observed.<n>We propose to adapt efficient influence function based data attribution methods to the RLHF setting to better understand the influence of fine-tuning data on individual trustworthiness benchmarks.
arXiv Detail & Related papers (2024-04-29T17:00:53Z) - Improving Factual Consistency of News Summarization by Contrastive Preference Optimization [65.11227166319546]
Large language models (LLMs) generate summaries that are factually inconsistent with original articles.<n>These hallucinations are challenging to detect through traditional methods.<n>We propose Contrastive Preference Optimization (CPO) to disentangle the LLMs' propensities to generate faithful and fake content.
arXiv Detail & Related papers (2023-10-30T08:40:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.