Fugu-MT 論文翻訳(概要): Spark-Prover-X1: Formal Theorem Proving Through Diverse Data Training

論文の概要: Spark-Prover-X1: Formal Theorem Proving Through Diverse Data Training

arxiv url: http://arxiv.org/abs/2511.13043v1
Date: Mon, 17 Nov 2025 06:44:02 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-18 14:36:24.786782
Title: Spark-Prover-X1: Formal Theorem Proving Through Diverse Data Training
Title（参考訳）: Spark-Prover-X1: 各種データトレーニングによる形式理論の証明
Authors: Xinyuan Zhou, Yi Lei, Xiaoyu Zhou, Jingyi Sun, Yu Zhu, Zhongyi Ye, Weitai Zhang, Quan Liu, Si Wei, Cong Liu,
Abstract要約: Spark-Prover-X1は、3段階のフレームワークを用いてトレーニングされた7Bパラメータモデルで、中等サイズのLCMの推論可能性を解き放つ。重要なイノベーションは、きめ細かい推論を達成するための"CoT拡張状態予測"タスクである。また、402のフォーマルな問題のベンチマークデータセットであるExamFormal-Benchを紹介します。
参考スコア（独自算出の注目度）: 24.74642687780046
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have shown significant promise in automated theorem proving, yet progress is often constrained by the scarcity of diverse and high-quality formal language data. To address this issue, we introduce Spark-Prover-X1, a 7B parameter model trained via an three-stage framework designed to unlock the reasoning potential of more accessible and moderately-sized LLMs. The first stage infuses deep knowledge through continuous pre-training on a broad mathematical corpus, enhanced by a suite of novel data tasks. Key innovation is a "CoT-augmented state prediction" task to achieve fine-grained reasoning. The second stage employs Supervised Fine-tuning (SFT) within an expert iteration loop to specialize both the Spark-Prover-X1-7B and Spark-Formalizer-X1-7B models. Finally, a targeted round of Group Relative Policy Optimization (GRPO) is applied to sharpen the prover's capabilities on the most challenging problems. To facilitate robust evaluation, particularly on problems from real-world examinations, we also introduce ExamFormal-Bench, a new benchmark dataset of 402 formal problems. Experimental results demonstrate that Spark-Prover-X1-7B achieves state-of-the-art performance among similarly-sized open-source models, attaining a 37.0\% average pass rate (pass@32). It shows exceptional performance on difficult competition benchmarks, notably solving 27 problems on PutnamBench (pass@32) and achieving 24.0\% on CombiBench (pass@32). Our work validates that this diverse training data and progressively refined training pipeline provides an effective path for enhancing the formal reasoning capabilities of lightweight LLMs. Both Spark-Prover-X1-7B and Spark-Formalizer-X1-7B, along with the ExamFormal-Bench dataset, are made publicly available at:https://www.modelscope.cn/organization/iflytek, https://gitcode.com/ifly_opensource.
Abstract（参考訳）: 大規模言語モデル(LLM)は、自動定理の証明において大きな可能性を示しているが、多種多様な高品質の形式言語データの不足により、進歩が制限されることがしばしばある。この問題に対処するために,我々はSpark-Prover-X1を紹介した。Spark-Prover-X1は,よりアクセシブルで適度なサイズのLLMの推論可能性の解放を目的とした,3段階のフレームワークを通じてトレーニングされた7Bパラメータモデルである。第1段階は、幅広い数学的コーパスの継続的な事前学習を通じて深い知識を注入し、一連の新しいデータタスクによって強化される。重要なイノベーションは、きめ細かい推論を達成するための"CoT拡張状態予測"タスクである。第2段階では、Spark-Prover-X1-7BとSpark-Formalizer-X1-7Bの両方を専門とする専門家の反復ループにスーパービジョン・ファインチューニング(SFT)を採用している。最後に,グループ相対政策最適化(GRPO)の目標ラウンドを適用し,最も困難な問題に対する証明者の能力を向上する。実世界の検査における問題に対するロバストな評価を容易にするため,402のフォーマルな問題のベンチマークデータセットであるExamFormal-Benchも導入した。実験結果から,Spark-Prover-X1-7Bは,同規模のオープンソースモデルにおいて,37.0\%の平均パスレート(pass@32。特にPatnamBench(pass@32)で27の問題を解決し、CombiBench(pass@32)で24.0\%を達成した。我々の研究は、この多様なトレーニングデータと漸進的に改良されたトレーニングパイプラインが、軽量LLMの形式的推論能力を高める効果的な経路であることを示す。 Spark-Prover-X1-7BとSpark-Formalizer-X1-7B、ExamFormal-Benchデータセットはいずれも、https://www.modelscope.cn/organization/iflytek, https://gitcode.com/ifly_opensourceで公開されている。

論文の概要: Spark-Prover-X1: Formal Theorem Proving Through Diverse Data Training

関連論文リスト