Fugu-MT 論文翻訳(概要): Hilbert: Recursively Building Formal Proofs with Informal Reasoning

論文の概要: Hilbert: Recursively Building Formal Proofs with Informal Reasoning

arxiv url: http://arxiv.org/abs/2509.22819v1
Date: Fri, 26 Sep 2025 18:24:23 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:18.890899
Title: Hilbert: Recursively Building Formal Proofs with Informal Reasoning
Title（参考訳）: Hilbert: Informal Reasoningを使ったフォーマルな証明を再帰的に構築する
Authors: Sumanth Varambally, Thomas Voice, Yanchao Sun, Zhifeng Chen, Rose Yu, Ke Ye,
Abstract要約: 大規模言語モデル(LLM)は、驚くべき数学的推論能力を示しているが、そのソリューションには自動検証できないエラーが含まれていることが多い。非公式な推論と形式的検証の相補的な強みを組み合わせたエージェントフレームワークであるHilbertを紹介する。我々のシステムは4つのコンポーネントを編成する: 数学的推論に優れる非公式のLLM、リーン4の戦術に最適化された特殊なLLM、形式検証器、意味定理検索器。
参考スコア（独自算出の注目度）: 38.36481253622752
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) demonstrate impressive mathematical reasoning abilities, but their solutions frequently contain errors that cannot be automatically verified. Formal theorem proving systems such as Lean 4 offer automated verification with complete accuracy, motivating recent efforts to build specialized prover LLMs that generate verifiable proofs in formal languages. However, a significant gap remains: current prover LLMs solve substantially fewer problems than general-purpose LLMs operating in natural language. We introduce Hilbert, an agentic framework that bridges this gap by combining the complementary strengths of informal reasoning and formal verification. Our system orchestrates four components: an informal LLM that excels at mathematical reasoning, a specialized prover LLM optimized for Lean 4 tactics, a formal verifier, and a semantic theorem retriever. Given a problem that the prover is unable to solve, Hilbert employs recursive decomposition to split the problem into subgoals that it solves with the prover or reasoner LLM. It leverages verifier feedback to refine incorrect proofs as necessary. Experimental results demonstrate that Hilbert substantially outperforms existing approaches on key benchmarks, achieving 99.2% on miniF2F, 6.6% points above the best publicly available method. Hilbert achieves the best known result on PutnamBench. It solves 462/660 problems (70.0%), outperforming proprietary approaches like SeedProver (50.4%) and achieving a 422% improvement over the best publicly available baseline. Thus, Hilbert effectively narrows the gap between informal reasoning and formal proof generation.
Abstract（参考訳）: 大規模言語モデル(LLM)は、驚くべき数学的推論能力を示すが、そのソリューションには自動検証できないエラーが頻繁に含まれている。 Lean 4のような形式的定理証明システムは、完全な精度で自動検証を提供する。しかし、大きなギャップが残っている: 現在の証明子 LLM は、自然言語で動作する汎用 LLM よりも、かなり少ない問題を解く。非公式な推論と形式的検証の相補的な強みを組み合わせることで、このギャップを橋渡しするエージェントフレームワークであるHilbertを紹介します。我々のシステムは4つのコンポーネントを編成する: 数学的推論に優れる非公式のLLM、リーン4の戦術に最適化された特殊なLLM、形式検証器、意味定理検索器。証明者が解けない問題を考えると、ヒルベルトは再帰分解を用いて問題を証明者または推論者 LLM で解ける部分ゴールに分割する。検証器のフィードバックを利用して、不正確な証明を必要に応じて洗練する。実験の結果、ヒルベルトは鍵となるベンチマークのアプローチをかなり上回り、MiniF2Fで99.2%を達成している。ヒルベルトはパットナムベンチで最もよく知られた結果を得た。 462/660問題(70.0%)を解決し、SeedProver(50.4%)のようなプロプライエタリなアプローチを上回り、最高の公開ベースラインよりも422%改善している。したがって、ヒルベルトは非公式な推論と形式的な証明生成の間のギャップを効果的に狭める。

論文の概要: Hilbert: Recursively Building Formal Proofs with Informal Reasoning

関連論文リスト