Fugu-MT 論文翻訳(概要): STAR-PólyaMath: Multi-Agent Reasoning under Persistent Meta-Strategic Supervision

論文の概要: STAR-PólyaMath: Multi-Agent Reasoning under Persistent Meta-Strategic Supervision

arxiv url: http://arxiv.org/abs/2605.19338v1
Date: Tue, 19 May 2026 04:20:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-20 15:03:09.116746
Title: STAR-PólyaMath: Multi-Agent Reasoning under Persistent Meta-Strategic Supervision
Title（参考訳）: STAR-PólyaMath: Persistent Meta-Strategic Supervisionの下でのマルチエージェント推論
Authors: Jiaao Wu, Xian Zhang, Hanzhang Liu, Sophia Zhang, Fan Yang, Yinpeng Dong,
Abstract要約: 拡張された長距離推論のためのマルチエージェントフレームワークSTAR-PlyaMathを紹介する。 STAR-PlyaMathは、ネストしたチャレンジ・ステップ・リプランループを備えたステートマシンとして構成されている。 8つのトップクラスのベンチマークで完全なスコアを達成します。
参考スコア（独自算出の注目度）: 25.371500356523896
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Frontier AI models and multi-agent systems have led to significant improvements in mathematical reasoning. However, for problems requiring extended, long-horizon reasoning, existing systems continue to suffer from fundamental reliability issues: hallucination accumulation, memory fragmentation, and imbalanced reasoning-tool trade-offs. In this paper, we introduce STAR-PólyaMath, a multi-agent framework that systematically addresses these challenges through meta-level supervision and structured Reasoner-Verifier interaction. STAR-PólyaMath is structured as an orchestrated state machine with nested challenge-step-replan loops, governed by a reasoning-free Python orchestrator that separates control from inference and bounds error propagation through trace-back and re-planning. Our key innovation is a persistent Meta-Strategist that maintains cross-attempt memory and exercises meta-level control by issuing high-level strategic guidance or mandatory directives, so the system can escape unproductive loops rather than stagnate or over-rely on tools. STAR-PólyaMath achieves state-of-the-art results on all eight top-tier competition benchmarks: AIME 2025-2026, MathArena Apex Shortlist, MathArena Apex 2025, Putnam 2025, IMO 2025, HMMT February 2026, and USAMO 2026. It obtains perfect scores on AIMEs, Putnam, and HMMT, and shows its largest margin on Apex 2025, scoring 93.75% compared with 80.21% by the strongest baseline GPT-5.5. Ablation studies show that the gains arise from the framework's orchestration rather than from model-level diversity since removing key components or substituting in mixed backbones consistently weakens performance. Code is available at https://github.com/Julius-Woo/STAR-PolyaMath.
Abstract（参考訳）: フロンティアAIモデルとマルチエージェントシステムは、数学的推論に大きな改善をもたらした。しかし、長期にわたる推論を必要とする問題に対して、既存のシステムは幻覚の蓄積、記憶の断片化、不均衡な推論とツールのトレードオフといった根本的な信頼性の問題に悩まされ続けている。本稿では,メタレベルの監視と構造化されたReasoner-Verifierインタラクションを通じて,これらの課題に体系的に対処するマルチエージェントフレームワークSTAR-PólyaMathを紹介する。 STAR-PólyaMathは、ネストしたチャレンジ-ステップ-リプランループを備えたオーケストレートステートマシンとして構成されており、推論から制御を分離し、トレースバックと再計画を通じてエラーの伝搬をバウンドする推論自由なPythonオーケストレータによって管理されている。私たちの重要なイノベーションは、メタストラテジスト(Meta-Strategist)という永続的なメモリを維持し、ハイレベルな戦略的ガイダンスや強制的な指示を発行することで、メタレベルの制御を実行することで、ツールの停滞や過剰ではなく、非生産的なループを回避できます。 STAR-PólyaMathは、AIME 2025-2026、MathArena Apex Shortlist、MathArena Apex 2025、Patnam 2025、IMO 2025、HMMT February 2026、USAMO 2026の8つの上位競合ベンチマークで最先端の結果を得た。 AIMEs、Putnam、HMMTで完全スコアを獲得し、Apex 2025では最強のベースラインであるGPT-5.5の80.21%に対して93.75%を記録した。アブレーション研究は、キーコンポーネントを取り除いたり、混合バックボーンに置換することで、モデルレベルの多様性ではなく、フレームワークのオーケストレーションによって得られる利益が、パフォーマンスを一貫して弱めることを示している。コードはhttps://github.com/Julius-Woo/STAR-PolyaMathで入手できる。

論文の概要: STAR-PólyaMath: Multi-Agent Reasoning under Persistent Meta-Strategic Supervision

関連論文リスト