Fugu-MT 論文翻訳(概要): Learning to Reason in Structured In-context Environments with Reinforcement Learning

論文の概要: Learning to Reason in Structured In-context Environments with Reinforcement Learning

arxiv url: http://arxiv.org/abs/2509.23330v1
Date: Sat, 27 Sep 2025 14:34:19 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.165685
Title: Learning to Reason in Structured In-context Environments with Reinforcement Learning
Title（参考訳）: 強化学習を用いた構造型インコンテキスト環境における推論学習
Authors: Peng Yu, Zeyuan Zhao, Shao Zhang, Luoyi Fu, Xinbing Wang, Ying Wen,
Abstract要約: 大規模言語モデル(LLM)は、強化学習(RL)を通して推論能力を大幅に向上させた。 textbfStructured textbfIn-context textbfEnvironment (SIE) フレームワークを紹介した。 SIEは大規模構造化データから推論環境を自動的に構築することでスケーラビリティを実現する。
参考スコア（独自算出の注目度）: 45.96068681848423
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have achieved significant advancements in reasoning capabilities through reinforcement learning (RL) via environmental exploration. As the intrinsic properties of the environment determine the abilities that LLMs can learn, the environment plays a important role in the RL finetuning process. An ideal LLM reasoning environment should possess three core characteristics: scalability, generalizable reasoning, and verifiability. However, existing mathematical and coding environments are difficult to scale due to heavy reliance on expert annotation, while the skills learned in game-based environments are too specialized to generalize. To bridge this gap, we introduce the \textbf{S}tructured \textbf{I}n-context \textbf{E}nvironment (SIE) framework. SIE achieves scalability by automatically constructing reasoning environments from large-scale structured data, where the rich compositional patterns naturally support generalizable reasoning. Moreover, the explicit schemas and reasoning chains in structured data provide a foundation for rule-based verifiability. Experimental results show that SIE framework not only achieves substantial improvements in in-domain structured reasoning, but also enables the learned compositional reasoning skills to generalize effectively to out-of-domain mathematical and logical reasoning tasks. We further explored learning in information-limited partial SIEs and found that LLMs can infer the missing information through exploring the environment, leading to robust reasoning improvements and generalization performance.
Abstract（参考訳）: 大規模言語モデル (LLM) は環境探索を通じて強化学習 (RL) を通じて推論能力を大幅に向上させた。環境の固有の性質がLLMが学べる能力を決定するため、環境はRL微調整プロセスにおいて重要な役割を果たす。理想的なLCM推論環境は、スケーラビリティ、一般化可能な推論、検証可能性の3つのコア特性を持つべきである。しかし,既存の数理・コーディング環境は専門家のアノテーションに大きく依存しているためスケールが困難であり,ゲームベース環境で学んだスキルは一般化するにはあまりにも専門的すぎる。このギャップを埋めるために、我々は \textbf{S}tructured \textbf{I}n-context \textbf{E}nvironment (SIE)フレームワークを導入する。 SIEは大規模構造化データから推論環境を自動的に構築することでスケーラビリティを実現する。さらに、構造化データの明示的なスキーマと推論チェーンは、ルールベースの検証可能性の基礎を提供する。実験結果から、SIEフレームワークはドメイン内構造推論の大幅な改善を実現するだけでなく、学習された構成推論スキルをドメイン外の数学的および論理的推論タスクに効果的に一般化することができることがわかった。さらに、情報限定部分SIEにおける学習について検討し、LLMが環境探索を通じて不足情報を推測し、堅牢な推論改善と一般化性能をもたらすことを発見した。

論文の概要: Learning to Reason in Structured In-context Environments with Reinforcement Learning

関連論文リスト