Fugu-MT 論文翻訳(概要): Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm

論文の概要: Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm

arxiv url: http://arxiv.org/abs/2509.23946v1
Date: Sun, 28 Sep 2025 15:48:40 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.54788
Title: Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm
Title（参考訳）: Explore-Execute Chain: 効率的な構造推論パラダイムを目指して
Authors: Kaisen Yang, Lixuan He, Rushi Shah, Kaicheng Yang, Qinwei Ma, Dianbo Liu, Alex Lamb,
Abstract要約: Chain-of-Thought(CoT)とその変種は、大規模言語モデル(LLM)の推論能力を著しく向上させた。 E2C$(Explore-Execute Chain)は、推論を2つの異なるフェーズに分離する構造化推論フレームワークである。
参考スコア（独自算出の注目度）: 8.405729585427226
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Chain-of-Thought (CoT) and its variants have markedly advanced the reasoning abilities of Large Language Models (LLMs), yet their monolithic and auto-regressive architecture inherently conflates high-level strategic planning with low-level step-by-step execution, leading to computational inefficiency, limited exploration of reasoning paths, and reduced interpretability. To overcome these issues, we propose the Explore-Execute Chain ($E^2C$), a structured reasoning framework that decouples reasoning into two distinct phases: an exploratory phase that stochastically generates succinct high-level plans, followed by an execution phase that deterministically carries out the chosen plan. Our approach incorporates a two-stage training methodology, which combines Supervised Fine-Tuning (SFT) - augmented by a novel data generation algorithm enforcing strict plan adherence - with a subsequent Reinforcement Learning (RL) stage that capitalizes on the informativeness of exploration and reinforces the determinism of execution.This decomposition enables an efficient test-time scaling strategy: on AIME'2024, $E^2C$ Test Time Scaling reaches 58.1% accuracy using <10% of the decoding tokens required by comparable methods (e.g., Forest-of-Thought), sharply cutting self-consistency overhead. For cross-domain adaptation, our Exploration-Focused SFT (EF-SFT) fine-tunes with only 3.5% of the tokens used by standard SFT yet yields up to 14.5% higher accuracy than standard SFT on medical benchmarks, delivering state-of-the-art performance, strong generalization, and greater interpretability by separating planning from execution. The code and pre-trained models for the project are available at: https://github.com/yks23/Explore-Execute-Chain.git
Abstract（参考訳）: CoT(Chain-of-Thought)とその変種は、大規模言語モデル(LLM)の推論能力を大幅に向上させたが、そのモノリシックで自己回帰的なアーキテクチャは本質的に、低レベルなステップバイステップ実行による高レベルな戦略的計画と混同し、計算の非効率化、推論パスの探索の制限、解釈可能性の低下につながっている。これらの問題を克服するために、探索・実行連鎖(E^2C$)という、推論を2つの異なるフェーズに分離する構造的推論フレームワークを提案し、それは、確率的に簡潔な高レベル計画を生成する探索フェーズと、決定的に選択された計画を実行する実行フェーズである。提案手法では,厳密な計画順守を図った新しいデータ生成アルゴリズムによって強化された2段階のトレーニング手法と,探索の有意性を生かし,実行の決定性を強化するための強化学習(RL)段階を併用する。この分解により,AIME'2024では,$E^2C$テスト時間スケーリングが,比較手法(例えばフォレスト・オブ・サート)で要求されるデコードトークンの10%を使用すれば,58.1%の精度で,効率的なテスト時間スケーリング戦略が実現される。クロスドメイン適応では、標準SFTが使用するトークンの3.5%しか持たないExploration-Focused SFT(EF-SFT)ファインチューニングが、医療ベンチマークの標準SFTよりも14.5%高い精度で実現され、最先端のパフォーマンス、強力な一般化、実行計画の分離による解釈可能性の向上を実現している。プロジェクトのコードと事前トレーニングされたモデルは、以下の通りである。

論文の概要: Explore-Execute Chain: Towards an Efficient Structured Reasoning Paradigm

関連論文リスト