KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality
- URL: http://arxiv.org/abs/2506.19807v3
- Date: Wed, 08 Oct 2025 16:56:59 GMT
- Title: KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality
- Authors: Baochang Ren, Shuofei Qiao, Da Zheng, Huajun Chen, Ningyu Zhang,
- Abstract summary: Large Language Models (LLMs) often exhibit severe hallucination, outputting incorrect content due to an inability to accurately recognize knowledge boundaries during reasoning.<n>We propose Knowledge-enhanced RL, KnowRL, to address the high hallucination in slow-thinking models.<n>KnowRL guides models to perform fact-based slow thinking by integrating a factuality reward, based on knowledge verification, into the RL training process.
- Score: 66.89190766278183
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs), particularly slow-thinking models, often exhibit severe hallucination, outputting incorrect content due to an inability to accurately recognize knowledge boundaries during reasoning. While Reinforcement Learning (RL) can enhance complex reasoning abilities, its outcome-oriented reward mechanism often lacks factual supervision over the thinking process, further exacerbating the hallucination problem. To address the high hallucination in slow-thinking models, we propose Knowledge-enhanced RL, KnowRL. KnowRL guides models to perform fact-based slow thinking by integrating a factuality reward, based on knowledge verification, into the RL training process, helping them recognize their knowledge boundaries. KnowRL guides models to perform fact-based slow thinking by integrating a factuality reward, based on knowledge verification, into the RL training process, helping them recognize their knowledge boundaries. This targeted factual input during RL training enables the model to learn and internalize fact-based reasoning strategies. By directly rewarding adherence to facts within the reasoning steps, KnowRL fosters a more reliable thinking process. Experimental results on three hallucination evaluation datasets and two reasoning evaluation datasets demonstrate that KnowRL effectively mitigates hallucinations in slow-thinking models while maintaining their original strong reasoning capabilities. Our code is available at https://github.com/zjunlp/KnowRL.
Related papers
- TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning [47.707273133540745]
Large language models (LLMs) are prone to hallucination and untruthful responses.<n>This presents a fundamental challenge for existing methods.<n>We present TruthRL, a general reinforcement learning framework that directly optimize the truthfulness of LLMs.
arXiv Detail & Related papers (2025-09-30T04:25:17Z) - From $f(x)$ and $g(x)$ to $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones [68.68686526804909]
We show that LLMs can acquire genuinely new skills during RL by composing existing ones.<n>Our experiments show that compositional skill acquired on a source task transfers to a different target task.<n>This transfer happens even without compositional training on the target, requiring only prior knowledge of the target's atomic skills.
arXiv Detail & Related papers (2025-09-29T17:44:27Z) - Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning [82.43575191712726]
We introduce a fine-grained analytic framework to dissect the impact ofReinforcement learning on reasoning.<n>Our framework specifically investigates key elements that have been hypothesized to benefit from RL training.
arXiv Detail & Related papers (2025-06-05T07:53:59Z) - The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models [63.98194996746229]
Large language models (LLMs) have significantly advanced in reasoning tasks through reinforcement learning (RL) optimization.<n>However, reasoning-oriented RL fine-tuning significantly increases the prevalence of hallucinations.<n>We propose Factuality-aware Step-wise Policy Optimization (FSPO), an innovative RL fine-tuning algorithm incorporating explicit factuality verification.
arXiv Detail & Related papers (2025-05-30T14:23:32Z) - The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination [85.18584652829799]
We introduce a novel framework to quantify factual hallucinations by modeling knowledge overshadowing.<n>We propose a new decoding strategy CoDa, to mitigate hallucinations, which notably enhance model factuality on Overshadow (27.9%), MemoTrap (13.1%) and NQ-Swap (18.3%)
arXiv Detail & Related papers (2025-02-22T08:36:06Z) - RLInspect: An Interactive Visual Approach to Assess Reinforcement Learning Algorithm [0.0]
Reinforcement Learning (RL) is a rapidly growing area of machine learning.
Assessing RL models can be challenging, which makes it difficult to interpret their behaviour.
We have developed RLInspect, an interactive visual analytic tool.
It takes into account different components of the RL model - state, action, agent architecture and reward, and provides a more comprehensive view of the RL training.
arXiv Detail & Related papers (2024-11-13T07:24:14Z) - Abstracted Trajectory Visualization for Explainability in Reinforcement
Learning [2.1028463367241033]
Explainable AI (XAI) has demonstrated the potential to help reinforcement learning (RL) practitioners to understand how RL models work.
XAI for users who do not have RL expertise (non-RL experts) has not been studied sufficiently.
We argue that abstracted trajectories, that depicts transitions between the major states of the RL model, will be useful for non-RL experts to build a mental model of the agents.
arXiv Detail & Related papers (2024-02-05T21:17:44Z) - Leveraging Reward Consistency for Interpretable Feature Discovery in
Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents.
We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents.
We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z) - A User Study on Explainable Online Reinforcement Learning for Adaptive
Systems [0.802904964931021]
Online reinforcement learning (RL) is increasingly used for realizing adaptive systems in the presence of design time uncertainty.
Deep RL gaining interest, the learned knowledge is no longer explicitly represented, but is represented as a neural network.
XRL-DINE provides visual insights into why certain decisions were made at important time points.
arXiv Detail & Related papers (2023-07-09T05:12:42Z) - A Survey on Explainable Reinforcement Learning: Concepts, Algorithms,
Challenges [38.70863329476517]
Reinforcement Learning (RL) is a popular machine learning paradigm where intelligent agents interact with the environment to fulfill a long-term goal.
Despite the encouraging results achieved, the deep neural network-based backbone is widely deemed as a black box that impedes practitioners to trust and employ trained agents in realistic scenarios where high security and reliability are essential.
To alleviate this issue, a large volume of literature devoted to shedding light on the inner workings of the intelligent agents has been proposed, by constructing intrinsic interpretability or post-hoc explainability.
arXiv Detail & Related papers (2022-11-12T13:52:06Z) - Explaining Online Reinforcement Learning Decisions of Self-Adaptive
Systems [0.90238471756546]
Design time uncertainty poses an important challenge when developing a self-adaptive system.
Online reinforcement learning is an emerging approach to realizing self-adaptive systems in the presence of design time uncertainty.
Deep RL represents learned knowledge as a neural network whereby it can generalize over unseen inputs.
arXiv Detail & Related papers (2022-10-12T05:38:27Z) - Explainability in Deep Reinforcement Learning [68.8204255655161]
We review recent works in the direction to attain Explainable Reinforcement Learning (XRL)
In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box.
arXiv Detail & Related papers (2020-08-15T10:11:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.