Fugu-MT 論文翻訳(概要): Knowledge-to-Verification: Exploring RLVR for LLMs in Knowledge-Intensive Domains

論文の概要: Knowledge-to-Verification: Exploring RLVR for LLMs in Knowledge-Intensive Domains

arxiv url: http://arxiv.org/abs/2605.18261v1
Date: Mon, 18 May 2026 11:59:31 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:49.51009
Title: Knowledge-to-Verification: Exploring RLVR for LLMs in Knowledge-Intensive Domains
Title（参考訳）: 知識と検証:知識集約型ドメインにおけるLLMのためのRLVR探索
Authors: Zhonghang Yuan, Zhefan Wang, Fang Hu, Zihong Chen, Jinzhe Li, Gang Li, Jie Ying, Huanjun Kong, Songyang Zhang, Nanqing Dong,
Abstract要約: 検証可能な報酬付き強化学習(RLVR)は,大規模言語モデルの推論能力を高める有望な可能性を示している。自動検証データ合成により,RLVRを知識集約領域に拡張するフレームワークであるK2V(Knowledge-to-Verification)を提案する。
参考スコア（独自算出の注目度）: 30.599618206614124
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning with verifiable rewards (RLVR) has demonstrated promising potential to enhance the reasoning capabilities of large language models (LLMs) in domains such as mathematics and coding. However, its applications on knowledge-intensive domains have not been effectively explored due to the scarcity of high-quality verifiable data. Furthermore, current RLVR focuses solely on the correctness of final answers, leading to the limitations of flawed reasoning and sparse reward signals. In this work, we propose Knowledge-to-Verification (K2V), a framework that extends RLVR to knowledge-intensive domains through automated verifiable data synthesis, while enabling verification of the LLM's reasoning process. Extensive experiments demonstrate that K2V enhances the reasoning of LLM in knowledge-intensive domains without significantly compromising the model's general capabilities. This study also suggests that integrating automated data synthesis with reasoning verification is a promising direction to enhance model capabilities in these broader domains. Code is available at https://github.com/SeedScientist/K2V.
Abstract（参考訳）: 検証可能な報酬付き強化学習(RLVR)は、数学やコーディングといった分野における大規模言語モデル(LLM)の推論能力を高める有望な可能性を実証している。しかし、その知識集約ドメインへの応用は、高品質な検証可能なデータの不足のため、効果的に検討されていない。さらに、現在のRLVRは最終回答の正しさにのみ焦点を合わせており、欠点のある推論とスパース報酬信号の制限につながっている。本研究では,LLMの推論プロセスの検証を可能としつつ,自動検証データ合成により,RLVRを知識集約領域に拡張するフレームワークであるK2Vを提案する。実験の結果,K2V は知識集約領域における LLM の推論を,モデルの汎用能力を著しく損なうことなく促進することを示した。本研究は,これらの領域において,自動データ合成と推論検証を統合することが,モデル機能の向上に有望な方向であることを示唆している。コードはhttps://github.com/SeedScientist/K2Vで入手できる。

論文の概要: Knowledge-to-Verification: Exploring RLVR for LLMs in Knowledge-Intensive Domains

関連論文リスト