Fugu-MT 論文翻訳(概要): GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

論文の概要: GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

arxiv url: http://arxiv.org/abs/2510.11769v1
Date: Mon, 13 Oct 2025 17:56:25 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-15 19:02:32.041137
Title: GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving
Title（参考訳）: GAR:形式的定理証明のための生成的対向強化学習
Authors: Ruida Wang, Jiarui Yao, Rui Pan, Shizhe Diao, Tong Zhang,
Abstract要約: 本稿では,GAR:Generative Adversarial Reinforcement Learningを提案する。 GARは暗黙のカリキュラム学習機構を導入し、課題の難易度を証明者の進化能力と整合させる。 GARトレーニングでは、Goedel-Prover-V2-8BとDeepSeek-Prover-V2-7Bが、MiniF2F-Testベンチマークで平均4.20%のパス@32の平均相対的な改善を実現している。
参考スコア（独自算出の注目度）: 23.743060792178067
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Solving math problems through verifiable languages such as Lean has significantly impacted both the mathematics and computer science communities. Current state-of-the-art models are often trained with expensive online Reinforcement Learning (RL) or expert iteration. However, these approaches rely on fixed problem sets, which causes inefficient training and limits the model to tackle complex problems. To overcome these limitations, we propose GAR: Generative Adversarial Reinforcement learning, a comprehensive RL training framework that jointly trains the problem composer and solver in an adversarial loop. GAR introduces an implicit curriculum learning mechanism, which aligns task difficulty with the prover's evolving capability. It thereby improves the training efficiency and enables stronger performance of proving advanced theorems. Experiments show that with GAR training, Goedel-Prover-V2-8B and DeepSeek-Prover-V2-7B achieve an average relative improvement in pass@32 of 4.20% on MiniF2F-Test benchmark, while DeepSeek-Prover-V2's pass@32 on ProofNet-Test increases from 22.58% to 25.81%. Beyond formal proving, GAR establishes a general RL paradigm for co-evolution of problem generation and solving under verifiable environments.
Abstract（参考訳）: リーンのような検証可能な言語による数学問題の解決は、数学とコンピュータサイエンスのコミュニティに大きな影響を与えている。現在の最先端モデルは、高価なオンライン強化学習(RL)やエキスパートのイテレーションでトレーニングされることが多い。しかし、これらのアプローチは固定された問題集合に依存しており、これは非効率なトレーニングを引き起こし、複雑な問題に取り組むためにモデルに制限を与える。これらの制約を克服するために、GAR: Generative Adversarial Reinforcement Learning, 総合的なRLトレーニングフレームワークを提案する。 GARは暗黙のカリキュラム学習機構を導入し、課題の難易度を証明者の進化能力と整合させる。これにより、訓練効率が向上し、高度な定理の証明性能が向上する。 GARトレーニングでは、Goedel-Prover-V2-8BとDeepSeek-Prover-V2-7Bが、MiniF2F-Testベンチマークで平均4.20%のパス@32、ProofNet-TestでDeepSeek-Prover-V2のパス@32が22.58%から25.81%に向上した。形式的証明の他に、GARは、検証可能な環境下での問題解決と問題生成の共進化のための一般的なRLパラダイムを確立している。

論文の概要: GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

関連論文リスト