Fugu-MT 論文翻訳(概要): From Building Blocks to Planning: Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning

論文の概要: From Building Blocks to Planning: Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning

arxiv url: http://arxiv.org/abs/2512.24532v1
Date: Wed, 31 Dec 2025 00:36:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-01 23:27:28.520086
Title: From Building Blocks to Planning: Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning
Title（参考訳）: 建築ブロックから計画へ:強化学習によるLLMにおけるマルチステップ空間推論
Authors: Amir Tahmasbi, Sadegh Majidi, Kazem Taram, Aniket Bera,
Abstract要約: 空間的推論を原子構造ブロックとその構成に分解する2段階の手法を提案する。まず, モデルに基本空間物理学を組み込むために, 回転, 翻訳, スケーリングなどの基本空間変換の教師付き微調整を適用する。次に、この物理認識モデルを凍結し、GRPOフレームワーク内で軽量のLoRAアダプタを訓練し、これらのビルディングブロックを構成するポリシーを学習し、マルチステッププランニングする。
参考スコア（独自算出の注目度）: 10.98910502098502
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spatial reasoning in large language models (LLMs) has gained increasing attention due to applications in navigation and planning. Despite strong general language capabilities, LLMs still struggle with spatial transformations and multi-step planning in structured environments. We propose a two-stage approach that decomposes spatial reasoning into atomic building blocks and their composition. First, we apply supervised fine-tuning on elementary spatial transformations, such as rotation, translation, and scaling, to equip the model with basic spatial physics. We then freeze this physics-aware model and train lightweight LoRA adapters within the GRPO framework to learn policies that compose these building blocks for multi-step planning in puzzle-based environments, in a closed-loop manner. To support this pipeline, we synthesize an ASCII-art dataset and construct a corresponding ASCII-based reinforcement learning environment. Our method consistently outperforms baselines, including the generic backbone, physics-aware model, and end-to-end RL models, under both Dynamic environments with explicit state updates and Static environments where the model must rely on its internal state across steps. In addition, the proposed approach converges faster and exhibits more stable training compared to end-to-end reinforcement learning from scratch. Finally, we analyze attention patterns to assess whether fine-tuning induces meaningful improvements in spatial understanding.
Abstract（参考訳）: 大規模言語モデル(LLM)における空間的推論は,ナビゲーションや計画への応用により注目されている。言語能力は強いが、LLMは構造化環境における空間変換と多段階計画に苦戦している。空間的推論を原子構造ブロックとその構成に分解する2段階の手法を提案する。まず, モデルに基本空間物理学を組み込むために, 回転, 翻訳, スケーリングなどの基本空間変換の教師付き微調整を適用する。次に、この物理認識モデルを凍結し、GRPOフレームワーク内で軽量のLORAアダプタを訓練し、パズルベースの環境において、これらのビルディングブロックを構成するポリシーをクローズループ方式で学習する。このパイプラインをサポートするために、ASCIIアートデータセットを合成し、対応するASCIIベースの強化学習環境を構築する。提案手法は,動的環境と静的環境の両方において,一般的なバックボーン,物理認識モデル,エンド・ツー・エンドのRLモデルなどのベースラインを常に上回っている。さらに,提案手法はより高速に収束し,スクラッチからエンドツーエンドの強化学習よりも安定した訓練を行う。最後に、注意パターンを分析し、微調整が空間的理解に有意義な改善をもたらすかどうかを評価する。

論文の概要: From Building Blocks to Planning: Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning

関連論文リスト