Fugu-MT 論文翻訳(概要): LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning

論文の概要: LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning

arxiv url: http://arxiv.org/abs/2603.21065v1
Date: Sun, 22 Mar 2026 05:16:09 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:39.221478
Title: LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning
Title（参考訳）: LongCat-Flash-Prover:エージェントツールによる強化学習によるネイティブ形式推論の強化
Authors: Jianing Wang, Jianfei Zhang, Qi Guo, Linsen Guo, Rumei Li, Chao Zhang, Chong Peng, Cunguang Wang, Dengchang Zhao, Jiarong Shi, Jingang Wang, Liulin Feng, Mengxia Shen, Qi Li, Shengnan An, Shun Wang, Wei Shi, Xiangyu Xi, Xiaoyu Li, Xuezhi Cao, Yi Lu, Yunke Zhao, Zhengyu Chen, Zhimin Lin, Wei Wang, Peng Pei, Xunliang Cai,
Abstract要約: LongCat-Flash-Proverはエージェントツール統合推論のためのオープンソースのMoEモデルである。これは、自己形式化と定理証明の両方において、オープンウェイトモデルのための新しい最先端状態を設定する。 MiniF2F-Testのパスレートは97.1%で、72の推論予算しか使用していない。
参考スコア（独自算出の注目度）: 46.294745464571456
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce LongCat-Flash-Prover, a flagship 560-billion-parameter open-source Mixture-of- Experts (MoE) model that advances Native Formal Reasoning in Lean4 through agentic tool-integrated reasoning (TIR). We decompose the native formal reasoning task into three independent formal capabilities, i.e., auto-formalization, sketching, and proving. To facilitate these capabilities, we propose a Hybrid-Experts Iteration Framework to expand high-quality task trajectories, including generating a formal statement based on a given informal problem, producing a whole-proof directly from the statement, or a lemma-style sketch. During agentic RL, we present a Hierarchical Importance Sampling Policy Optimization (HisPO) algorithm, which aims to stabilize the MoE model training on such long-horizon tasks. It employs a gradient masking strategy that accounts for the policy staleness and the inherent train-inference engine discrepancies at both sequence and token levels. Additionally, we also incorporate theorem consistency and legality detection mechanisms to eliminate reward hacking issues. Extensive evaluations show that our LongCat-Flash-Prover sets a new state-of-the-art for open-weights models in both auto-formalization and theorem proving. Demonstrating remarkable sample efficiency, it achieves a 97.1% pass rate on MiniF2F-Test using only 72 inference budget per problem. On more challenging benchmarks, it solves 70.8% of ProverBench and 41.5% of PutnamBench with no more than 220 attempts per problem, significantly outperforming existing open-weights baselines.
Abstract（参考訳）: このモデルでは,エージェントツール統合推論(TIR)を通じて,Lean4のNative Formal Reasoning(Native Formal Reasoning)を進化させる。我々は、ネイティブな形式推論タスクを、3つの独立した形式的能力、すなわち自動形式化、スケッチ、証明に分解する。これらの機能を実現するために,与えられた非公式な問題に基づいて形式文を生成すること,ステートメントから直接全文を生成すること,あるいはレムマスタイルのスケッチを含む,高品質なタスクトラジェクトリを拡張するハイブリット・エキスパート・イテレーション・フレームワークを提案する。エージェントRLでは,階層的重要度サンプリングポリシー最適化(HisPO)アルゴリズムが提案される。それは、ポリシーの不安定さと、シーケンスレベルとトークンレベルの両方で固有の列車推論エンジンの相違を考慮に入れた勾配マスキング戦略を採用している。さらに、報酬ハッキング問題を排除するために、定理整合性と合法性検出機構も組み込んだ。我々のLongCat-Flash-Proverは、自己形式化と定理証明の両方において、オープンウェイトモデルのための新しい最先端を設定できることを示す。顕著なサンプル効率を示すため、MiniF2F-Testでは72の推論予算で97.1%のパスレートを達成した。より困難なベンチマークでは、ProverBenchの70.8%とPutnamBenchの41.5%を解決し、既存のオープンウェイトベースラインを大幅に上回っている。

論文の概要: LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning

関連論文リスト