Related papers: Learning to Reason for Factuality

Learning to Reason for Factuality

URL: http://arxiv.org/abs/2508.05618v1
Date: Thu, 07 Aug 2025 17:57:09 GMT
Title: Learning to Reason for Factuality
Authors: Xilun Chen, Ilia Kulikov, Vincent-Pierre Berges, Barlas Oğuz, Rulin Shao, Gargi Ghosh, Jason Weston, Wen-tau Yih,
Abstract summary: We propose a novel reward function that simultaneously considers the factual precision, response detail level, and answer relevance.<n>Our model achieves an average reduction of 23.1 percentage points in hallucination rate, a 23% increase in answer detail level, and no degradation in the overall response helpfulness.
Score: 48.08503522255537
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reasoning Large Language Models (R-LLMs) have significantly advanced complex reasoning tasks but often struggle with factuality, generating substantially more hallucinations than their non-reasoning counterparts on long-form factuality benchmarks. However, extending online Reinforcement Learning (RL), a key component in recent R-LLM advancements, to the long-form factuality setting poses several unique challenges due to the lack of reliable verification methods. Previous work has utilized automatic factuality evaluation frameworks such as FActScore to curate preference data in the offline RL setting, yet we find that directly leveraging such methods as the reward in online RL leads to reward hacking in multiple ways, such as producing less detailed or relevant responses. We propose a novel reward function that simultaneously considers the factual precision, response detail level, and answer relevance, and applies online RL to learn high quality factual reasoning. Evaluated on six long-form factuality benchmarks, our factual reasoning model achieves an average reduction of 23.1 percentage points in hallucination rate, a 23% increase in answer detail level, and no degradation in the overall response helpfulness.

Related papers

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective [82.24301452333577]
Reinforcement learning (RL) has emerged as a promising approach to improve large language model (LLM) reasoning.<n>A key challenge lies in the lack of reliable, scalable RL reward signals across diverse reasoning domains.<n>We introduce Guru, a curated RL reasoning corpus of 92K verifiable examples spanning six reasoning domains.
arXiv Detail & Related papers (2025-06-17T20:24:00Z)
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions [28.962415274754537]
Large language model (LLM) reasoning has shown that sophisticated behaviors such as planning and self-reflection can emerge through reinforcement learning (RL)<n>We introduce a novel training approach, textbfReLIFT (textbfReinforcement textbfL textbfInterleaved with Online textbfFine-textbfTuning)<n>In ReLIFT, the model is primarily trained using RL, but when it encounters challenging questions, high-quality solutions are collected for fine-tuning, and the training process alternate
arXiv Detail & Related papers (2025-06-09T08:11:20Z)
The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models [63.98194996746229]
Large language models (LLMs) have significantly advanced in reasoning tasks through reinforcement learning (RL) optimization.<n>However, reasoning-oriented RL fine-tuning significantly increases the prevalence of hallucinations.<n>We propose Factuality-aware Step-wise Policy Optimization (FSPO), an innovative RL fine-tuning algorithm incorporating explicit factuality verification.
arXiv Detail & Related papers (2025-05-30T14:23:32Z)
Reinforced Informativeness Optimization for Long-Form Retrieval-Augmented Generation [77.10390725623125]
Long-form question answering (LFQA) presents unique challenges for large language models.<n>RioRAG is a novel reinforcement learning framework that advances long-form RAG through reinforced informativeness optimization.
arXiv Detail & Related papers (2025-05-27T07:34:41Z)
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning [50.02117478165099]
We show that large-scale reinforcement learning can significantly enhance the reasoning capabilities of strong, small- and mid-sized models.<n>We propose a simple yet effective approach: first training on math-only prompts, then on code-only prompts.
arXiv Detail & Related papers (2025-05-22T08:50:47Z)
VeriFact: Enhancing Long-Form Factuality Evaluation with Refined Fact Extraction and Reference Facts [6.810019560977178]
We introduce VeriFact, a factuality evaluation framework designed to enhance fact extraction.<n>We also introduce FactRBench, a benchmark that evaluates both precision and recall in long-form model responses.<n> Empirical evaluations show that VeriFact significantly enhances fact completeness and preserves complex facts with critical relational information.
arXiv Detail & Related papers (2025-05-14T18:02:37Z)
Concise Reasoning via Reinforcement Learning [13.657506042120167]
We revisit the core principles of reinforcement learning (RL)<n>We uncover a natural correlation between conciseness and accuracy that has been largely overlooked.<n>We show that introducing a secondary phase of RL training, using a very small set of problems, can significantly reduce chains of thought.
arXiv Detail & Related papers (2025-04-07T15:35:54Z)
A Long Way to Go: Investigating Length Correlations in RLHF [59.49656695716066]
This paper demonstrates, on three diverse settings, that optimizing for response length is a significant factor behind RLHF. We find improvements in reward to largely be driven by increasing response length, instead of other features. Even a purely length-based reward reproduces most downstream RLHF improvements over supervised fine-tuned models.
arXiv Detail & Related papers (2023-10-05T17:38:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.