Fugu-MT 論文翻訳(概要): Plan Then Retrieve: Reinforcement Learning-Guided Complex Reasoning over Knowledge Graphs

論文の概要: Plan Then Retrieve: Reinforcement Learning-Guided Complex Reasoning over Knowledge Graphs

arxiv url: http://arxiv.org/abs/2510.20691v2
Date: Mon, 27 Oct 2025 07:30:51 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-28 13:14:10.63418
Title: Plan Then Retrieve: Reinforcement Learning-Guided Complex Reasoning over Knowledge Graphs
Title（参考訳）: 知識グラフを用いた強化学習誘導複合推論
Authors: Yanlin Song, Ben Liu, Víctor Gutiérrez-Basulto, Zhiwei Hu, Qianqian Xie, Min Peng, Sophia Ananiadou, Jeff Z. Pan,
Abstract要約: Graph-RFTは、"plan-KGsearch-and-Websearch-during-think"パラダイムを備えた、2段階強化KGQAフレームワークである。これにより、LLMは、不完全な知識条件下で、KGやWebソース間で自律的な計画と適応的なスケジューリングを行うことができる。
参考スコア（独自算出の注目度）: 52.16166558205338
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Knowledge Graph Question Answering aims to answer natural language questions by reasoning over structured knowledge graphs. While large language models have advanced KGQA through their strong reasoning capabilities, existing methods continue to struggle to fully exploit both the rich knowledge encoded in KGs and the reasoning capabilities of LLMs, particularly in complex scenarios. They often assume complete KG coverage and lack mechanisms to judge when external information is needed, and their reasoning remains locally myopic, failing to maintain coherent multi-step planning, leading to reasoning failures even when relevant knowledge exists. We propose Graph-RFT, a novel two-stage reinforcement fine-tuning KGQA framework with a 'plan-KGsearch-and-Websearch-during-think' paradigm, that enables LLMs to perform autonomous planning and adaptive retrieval scheduling across KG and web sources under incomplete knowledge conditions. Graph-RFT introduces a chain-of-thought fine-tuning method with a customized plan-retrieval dataset activates structured reasoning and resolves the GRPO cold-start problem. It then introduces a novel plan-retrieval guided reinforcement learning process integrates explicit planning and retrieval actions with a multi-reward design, enabling coverage-aware retrieval scheduling. It employs a Cartesian-inspired planning module to decompose complex questions into ordered subquestions, and logical expression to guide tool invocation for globally consistent multi-step reasoning. This reasoning retrieval process is optimized with a multi-reward combining outcome and retrieval specific signals, enabling the model to learn when and how to combine KG and web retrieval effectively.
Abstract（参考訳）: Knowledge Graph Question Answeringは、構造化された知識グラフを推論することで自然言語の質問に答えることを目的としている。大きな言語モデルは強力な推論能力を通じてKGQAを進化させてきたが、既存の手法は、KGに符号化された豊富な知識と、特に複雑なシナリオにおいてLLMの推論能力の両方を完全に活用することに苦慮している。彼らはしばしば、完全なKGカバレッジを仮定し、外部情報が必要かどうかを判断するメカニズムが欠如していると仮定し、それらの推論は局所的なミオピックのままであり、一貫性のある多段階計画の維持に失敗し、関連する知識が存在する場合でも推論に失敗する。我々は,2段階の強化KGQAフレームワークであるGraph-RFTを提案する。これは'plan-KGsearch-and-Websearch-during-think'パラダイムで,LLMが不完全な知識条件下で,KGおよびWebソース間で自律的な計画と適応的なスケジューリングを行うことを可能にする。 Graph-RFTは、構造化推論を活性化し、GRPOコールドスタート問題を解消する、カスタマイズされた計画検索データセットを備えたチェーン・オブ・ファインチューニング手法を導入している。次に、明示的な計画と検索動作とマルチリワード設計を統合し、カバレッジを意識した検索スケジューリングを可能にする新しい計画検索強化学習プロセスを導入する。複雑な質問を順序付けられたサブクエストに分解するために、Cartesianにインスパイアされた計画モジュールと、ツールの実行を一貫した多段階推論のガイドする論理式が使用されている。この推論検索プロセスは、結果と検索特定信号を組み合わせたマルチリワードで最適化され、KGとWeb検索を効果的に組み合わせるタイミングと方法を学ぶことができる。

論文の概要: Plan Then Retrieve: Reinforcement Learning-Guided Complex Reasoning over Knowledge Graphs

関連論文リスト