Fugu-MT 論文翻訳(概要): Pseudo-Formalization for Automatic Proof Verification

論文の概要: Pseudo-Formalization for Automatic Proof Verification

arxiv url: http://arxiv.org/abs/2605.20531v1
Date: Tue, 19 May 2026 22:08:51 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.394984
Title: Pseudo-Formalization for Automatic Proof Verification
Title（参考訳）: 自動証明のための擬似形式化
Authors: Slim Barkallah, Luke Bailey, Kaiyue Wen, Mohammed Abouzaid, Tengyu Ma,
Abstract要約: 証明の信頼性検証は、厳密な数学的推論に基づくAIシステムのトレーニングと評価のボトルネックとして依然として残っている。 Pseudo-Formalization (PF) は形式的証明のモジュラリティと精度をキャプチャする証明形式である。今後の研究を支援するため,研究レベルの検証ベンチマークArxivMathGradingBenchをリリースする。
参考スコア（独自算出の注目度）: 17.612188352560494
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reliable verification of proofs remains a bottleneck for training and evaluating AI systems on hard mathematical reasoning. Fully formal proofs, in languages like Lean, are easy to verify because they are unambiguous and modular. Most proofs, particularly those written by AI systems, have neither property, and translating them into formal languages remains challenging in many frontier math settings. We propose Pseudo-Formalization (PF), a proof format that captures the modularity and precision of formal proofs while retaining the flexibility of natural language. A Pseudo-Formal proof is decomposed into self-contained modules, each stating its premises, conclusion, and proof in natural language. To verify the correctness of a regular natural language proof, an LLM translates it to Pseudo-Formal and then verifies each module independently, an algorithm we call Block Verification (BV). We evaluate PF+BV on two benchmarks spanning olympiad and research-level mathematics, where it pareto-dominates LLM-as-judge baselines on error-finding precision and recall. To support future work, we release our research-level proof verification benchmark ArxivMathGradingBench.
Abstract（参考訳）: 証明の信頼性検証は、厳密な数学的推論に基づくAIシステムのトレーニングと評価のボトルネックとして依然として残っている。リーンのような言語での完全な形式的な証明は、曖昧でモジュール化されているため、容易に検証できます。ほとんどの証明、特にAIシステムによって書かれた証明は、プロパティを持っておらず、それらを形式言語に翻訳することは、多くのフロンティア数学設定において難しいままである。 Pseudo-Formalization (PF) は、自然言語の柔軟性を維持しつつ、形式的証明のモジュラリティと精度をキャプチャする証明形式である。擬形式証明は自己完備加群に分解され、それぞれが自然言語の前提、結論、証明を記述する。正規自然言語証明の正当性を検証するために、LLMはそれを擬形式に変換し、各モジュールを独立に検証する。 PF+BVはオリンピアードと研究レベルの数学の2つのベンチマークで評価され,LLM-as-judgeベースラインは誤りフィリングの精度とリコールに基づいてパレートされる。今後の研究を支援するため,研究レベルの検証ベンチマークArxivMathGradingBenchをリリースする。

論文の概要: Pseudo-Formalization for Automatic Proof Verification

関連論文リスト